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Abstract 


The  purpose  of  the  present  investigation  is  to  assess  the  feasibility  of  simulating  and 
studying  coherent  structures  in  turbulent  shear  layers,  making  use  of  Large  Eddy  Simulations 
(LES). 

Volumes  I  and  II  were  devoted  to  the  investigation  of  coherent  structures  in  LES  solutions 
obtained  using  a  structured-grid  finite-difference  code,  first  in  an  attached  turbulent  flow,  and 
then  in  the  wake  behind  a  square  cylinder. 

The  structured  grid  code  used  in  Parts  I  &  II,  despite  its  multi-domain  capability,  is 
limited  in  the  domain  geometries  it  can  handle.  The  third  part  of  the  work  was  therefore 
devoted  to  the  development  of  an  unstructured  grid  LES  code  able  to  compute  turbulent 
flows  over  arbitrary  two-dimensional  geometries. 

A  combined  finite-element/Fourier  spectral  space  discretization  scheme  was  selected,  as 
it  combines  optimally  the  geometrical  flexibility  provided  by  the  finite  element  scheme  and 
the  computational  efficiency  resulting  from  the  decomposition  in  Fourier  modes  in  the  out- 
of-plane  direction.  Indeed,  thanks  to  this  decomposition,  and  a  suitable  treatment  of  the 
nonlinear  convective  terms,  the  3D  flow  problem  is  transformed  into  a  series  of  2D  problems 
in  Fourier  space,  completely  decoupled  within  each  time  step.  In  addition,  the  decoupling 
allows  for  an  easy  parallelization  by  partitioning  the  work  in  Fourier  space  rather  than  physical 
space.  Also,  the  stabilized  finite  element  technique  selected  for  the  in-plane  discretization 
provides  an  accuracy  superior  to  its  finite  volume  counterpart  on  the  same  unstructured  grid 
and  for  the  same  discretization  stencil. 

The  development  of  the  code  was  broken  down  in  the  following  steps: 

1.  development  of  a  two-dimensional  unsteady  laminar  flow  solver,  and  validation; 

2.  development  of  a  three-dimensional  unsteady  laminar  flow  solver,  and  validation; 

3.  development  of  the  three-dimensional  LES  code,  validation,  and  application  to  an  orig¬ 
inal  flow  problem. 

Although  originally  planned,  the  extension  of  the  coherent  structures  detection  algorithm 
developed  in  Part  I  for  the  structured  grid  LES  solver  could  not  be  carried  out  for  lack  of 
time,  as  the  development  and  testing  of  the  solver  took  more  time  than  anticipated. 

The  computational  results  obtained  with  the  present  solver  were  found  in  excellent  agree¬ 
ment  with  existing  experimental  and  computational  data  for  the  test  case  of  the  flow  over  a 
circular  cylinder.  The  original  computation  of  the  flow  over  a  circular  cylinder  with  splitter 
plate  was  also  found  to  be  in  excellent  agreement  with  experimental  data. 

ii 


These  encouraging  results  suggest  that  the  developed  code  has  a  great  potential  for 
further  investigation  of  turbulent  flows  over  general  two-dimensional  geometries.  Neverthe¬ 
less,  improvements  are  possible  and  needed.  Suggestions  for  such  improvements,  and  further 
investigations  are  provided. 
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1.  INTRODUCTION 


The  work  presented  in  this  dissertation  contributes  to  the  numerical  simulation  of  un¬ 
steady,  turbulent,  incompressible  flows  via  the  development  of  a  large-eddy  simulation  (LES) 
solver  for  complex  two-dimensional  (2D)  geometries  using  a  combined  finite-element/spectral 
method.  This  mixed  spatial  discretization  coupled  with  certain  time  discretizations  reduces 
the  computational  cost,  both  in  terms  of  operation  counts  and  storage  requirements. 

This  introductory  chapter  first  presents  some  fundamental  concepts  related  to  the  de¬ 
velopment  of  the  solution  algorithm.  A  discussion  of  the  objectives  and  contributions  of  the 
work  as  well  as  a  brief  presentation  of  some  related  works  follows. 

1.1  Fundamental  Concepts 

1.1.1  Computational  Fluid  Dynamics 

Computational  fluid  dynamics  (CFD)  is  the  numerical  simulation  of  fluid-flow  systems. 
Initially  restricted  to  coarse  approximations  of  simple  problems,  CFD  has  developed  to  the 
stage  where  it  now  complements  experiments  in  complex  research  and  design  applications. 
The  two  factors  driving  CFD  forward  are  the  increasing  performance  and  decreasing  cost  of 
computer  hardware,  and  the  increasing  efficiency  of  numerical  algorithms. 

From  basic  conservation  principles,  the  partial  differential  equations  (PDEs)  governing 
fluid  flow  can  be  derived.  For  unsteady,  isothermal,  incompressible  flow  of  a  Newtonian  fluid, 
these  governing  equations  are  the  Navier-Stokes  (NS)  equations  (conservation  of  momentum) 
and  the  continuity  equation  (conservation  of  mass).  The  equations  are  formulated  in  terms 
of  the  primitive  variables,  that  is  velocity  u  and  pressure  p.  In  the  absence  of  body  forces, 
the  incompressible  NS  equations  become 


+  V)u=  — Vp  +  v  (V2u) 

C/C 

and  the  continuity  equation  is 


(1.1) 


V  •  u  =  0.  (1.2) 

In  the  above  equations,  t  is  time,  u  the  kinematic  viscosity,  p  the  kinematic  pressure  (pressure 
divided  by  density),  and  V  the  gradient  operator. 

Unfortunately,  because  of  the  complexity  of  the  equations  and  the  geometry  of  the  do¬ 
mains  of  interest,  it  is  generally  impossible  to  obtain  analytical  or  closed-form  solutions  for 
the  flow  variables  (velocity,  pressure,  etc.).  The  object  of  CFD,  therefore,  is  the  development 
of  algorithms  to  compute  an  approximate  solution  to  the  mathematical  problem. 
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This  approximate  solution  is  characterized  by  a  finite  number  of  unknowns,  typically  the 
flow  variables  at  a  set  of  points  (nodes)  within  the  domain  of  interest.  Consequently,  the 
computational  domain  is  typically  divided  into  non-overlapping,  finite-dimensional  cells  or 
elements,  with  the  flow  variables  calculated  at  characteristic  points  of  the  cells  or  elements, 
such  as  the  vertices  or  centers  of  gravity. 

For  each  unknown,  an  algebraic  equation  is  derived  from  the  differential  problem  and  the 
boundary  conditions  governing  the  flow.  The  way  by  which  this  set  of  equations  is  derived 
is  termed  a  discretization  method.  A  number  of  discretization  methods  have  been  devel¬ 
oped,  the  most  common  being  the  finite-difference  method  (FDM),  the  finite- volume  method 
(FVM),  and  the  finite-element  method  (FEM).  These  methods  are  discussed  in  numerous 
texts,  including  (Fletcher  et  al.,  1991;  Hirsch,  1988;  Dick,  1993b, a).  A  less  common,  but 
nonetheless  valuable  discretization  is  the  spectral  method  (Zienkiewicz,  1975). 

1.1.2  Meshes 

Figure  1.1  shows  some  typical  space  discretizations  (meshes)  for  a  simple  2D  circular  do¬ 
main,  each  using  approximately  175  nodes.  The  difference  between  structured,  unstructured, 
and  semi-structured  meshes  is  easily  observed. 

Structured  meshes,  Figure  l.l(a,b),  possess  inherent  ordered  connectivity  that  allow 
for  efficient  solution  techniques  in  the  FDM  and  FVM.  That  is,  all  mesh  points  lie  on  the 
intersections  of  two  families  of  curves,  say  a  vertical  mesh  line  i  and  a  horizontal  mesh  line  j. 
Consequently,  each  grid  point  can  be  uniquely  identified  via  a  set  of  integers  as  (i,  j).  In  terms 
of  a  computer  algorithm,  all  nodes  in  the  mesh  can  be  visited  by  a  simple  loop  over  i  and  j. 
Neighboring  nodes  are  easily  identified  as  well,  i.e.  +  (»  — 1,  j),  (>,  j  +  1),  and  (i,j- 1). 

However,  structured  meshes  are  very  limited  in  the  geometries  that  can  be  represented  well. 
This  can  be  seen  from  the  simple  circle  example,  where  highly  skewed  elements  appear. 

The  most  significant  advantage  of  unstructured  meshes,  Figure  l.l(c,d),  is  the  flexibil¬ 
ity  to  handle  complex  geometries.  Additionally,  unstructured  meshes  lend  themselves  very 
well  to  solution-adaptive  mesh  refinement/coarsening  techniques.  As  expected,  these  advan¬ 
tages  come  at  a  price — unstructured  meshes  impose  additional  storage  overhead  because  the 
element  connectivity  must  be  explicitly  stored,  and  solution  algorithms  are  typically  more 
computationally  expensive. 

Semi-structured  (sometimes  termed  multi-block  structured)  meshes,  Figure  1.1(e)  pro¬ 
vide  a  compromise  between  structured  and  unstructured  meshes.  These  meshes  maintain 
some  inherent  ordered  connectivity  by  combining  “blocks”  of  structured  nodes  together  in  an 
unstructured  arrangement.  This  type  of  mesh  provides  an  alternative  to  unstructured  meshes, 
but  is  less  flexible  and  usually  requires  more  user  input  to  create.  In  addition,  solvers  based 
on  semi-structured  meshes  tend  to  have  accuracy  troubles  at  the  block  interfaces. 
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Fig.  1.1  -  Comparison  of  2D  meshes:  (a)  structured  quadrilateral,  (b)  structured  triangle,  (c) 
unstructured  quadrilateral,  (d)  unstructured  triangle,  (e)  semi-structured  (multi¬ 
block  structured)  quadrilateral. 


Finally,  it  should  be  stated  that  though  structured  meshes  can,  of  course,  be  used  with 
the  FEM,  the  method  is  inherently  realizable  on  unstructured  meshes. 

1.1.3  The  Finite-Element  Method 

The  FEM,  just  like  the  FDM  and  FVM,  is  simply  a  method  for  approximately  solving 
PDEs.  Upon  being  introduced  to  the  FEM,  one  immediately  notices  the  somewhat  abstract 
mathematical  character  of  the  method  in  contrast  to  the  physical-type  nature  of  the  FVM. 
This  is,  in  fact,  an  advantage  to  the  method  in  that  applied  mathematicians  have  provided  a 
solid  mathematical  foundation.  And,  as  asserted  by  Gresho  (Gresho  and  Sani,  2000),  “most 
of  the  difficult  mathematics  can  be  bypassed  by  practitioners  and  even  code  builders.”  There 
are  some  definite  advantages  to  the  FEM  that  will  be  highlighted  in  this  brief  introduction 
to  the  method. 

The  first  characteristic  of  the  method  is  that  the  domain  is  subdivided  into  cells,  called 
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elements,  which  form  a  mesh.  There  is  no  restriction  as  to  the  geometric  form  of  the  elements, 
though  triangle  or  quadrilateral  forms  are  typically  seen  in  two  dimensions  (tetrahedron  and 
hexahedron  are  most  common  in  three  dimensions).  The  first  and  most  important  advantage 
of  the  FEM  is  that  this  mesh  can  be  unstructured,  allowing  complex  geometries  to  be  handled 
with  relative  ease.  The  FVM,  but  not  traditional  FDMs  share  this  feature. 

The  elements  consist  not  only  of  nodes  and  edges,  but  include  shape  functions  (one  for 
each  node)  that  define  how  the  solution  varies  inside  the  element.  These  shape  functions 
are  also  commonly  termed  interpolation  functions  or  trial  functions.  This  defines  a  function 
space  in  which  the  solution  to  the  P DE  must  belong.  In  the  case  of  linear  triangle  elements 
(so-called  PI  elements),  which  are  used  in  this  work,  one  considers  a  continuous  function  that 
varies  linearly  over  each  element,  as  shown  in  Figure  1.2. 


Fig.  1.2  Shape  functions  for  the  PI  (linear  triangle)  element  corresponding  to  (a)  node  i, 
(b)  node  j,  and  (c)  node  k. 

The  final  characteristic  of  the  FEM  that  will  be  discussed  here  is  that  the  method  does 
not  attempt  to  find  an  approximate  solution  to  the  PDE  itself,  but  rather  for  a  solution  of 
some  integral  form  of  the  PDE.  This  integral  form  is  most  commonly — but  not  necessarily — 
obtained  from  a  weighted  residual  formulation.  This  entails  multiplying  the  PDE  with  prede¬ 
fined  weight  functions  and  integrating  over  the  domain.  The  Galerkin  formulation  is  obtained 
if  the  form  of  the  shape  functions  is  also  employed  for  the  weight  functions.  If  the  form  of  the 
shape  and  weight  functions  differ,  a  Petrov-Galerkin  formulation  is  obtained.  This  weighted 
residual  approach  gives  rise  to  the  second  major  advantage  to  the  FEM— the  ability  to  nat¬ 
urally  handle  Neumann  boundary  conditions.  Neither  the  FDM  nor  FVM  share  this  feature. 

When  the  Galerkin  finite-element  (FE)  formulation  is  applied  to  the  Navier-Stokes  equa¬ 
tions,  two  stability  problems  are  encountered.  First,  at  high  Reynolds  numbers  (convection- 
dominated  flow),  oscillations  appear  in  the  approximate  velocity  solution.  This  phenomenon, 
which  is  also  seen  with  the  central  difference  counterparts  in  the  FDM  and  FVM,  can  be 
handled  in  two  ways. 

•  Imposing  a  restriction  on  the  element  Peclet  number:  Pe  =  \\u\\h/u  <  2,  where  h  is  the 
characteristic  element  length  in  the  direction  of  the  local  velocity  u. 
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•  Introducing  upwinding. 


The  former,  restricting  Pe  <  2,  guarantees  convective  stability  but  at  the  expense  of  high 
numbers  of  elements.  That  is,  as  the  Reynolds  number  of  the  flow  is  increased,  the  mesh 
must  be  severely  refined — usually  much  more  so  than  is  required  for  adequate  resolution  of 
the  general  flowfield.  Because  of  this  drawback,  a  number  of  upwinding  schemes  have  been 
developed  in  the  FE  framework,  the  most  common  being  the  Streamline-Upwind  Petrov- 
Galerkin  (SUPG)  formulation.  This  formulation  is  advantageous  because  it  maintains  the 
second-order  accuracy  of  the  Galerkin  formulation  and  is  also  not  subject  to  the  artificial 
cross-flow  diffusion  criticisms  associated  with  many  classical  upwind  methods. 

In  addition  to  the  above-mentioned  convection- induced  oscillations,  Galerkin  discretiza¬ 
tions  (as  well  as  collocated  variable  discretizations  in  the  FDM  and  FVM)  suffer  from  spu¬ 
rious  pressure  oscillations  due  to  the  so-called  odd-even  decoupling  phenomenon.  In  the 
FE  framework,  these  instabilities  are  formulated  as  a  violation  of  the  so-called  Babuska- 
Brezzi  (Babuska,  1971,  1973;  Brezzi,  1974)  condition.  The  practical  restriction  placed  by  the 
Babuska-Brezzi  condition  is  that  the  pressure  interpolation  must  be  of  lower  order  than  the 
velocity  interpolation.  These  spurious  pressure  oscillations  can  be  eliminated  in  one  of  two 
ways. 


•  Using  a  compatible  element  (equivalent  to  using  a  staggered-grid  approach  in  the  FDM 
and  FVM)  that  satisfies  the  Babuska-Brezzi  condition. 

•  Adding  a  small  stabilizing  term  in  the  continuity  equation  that  essentially  filters  out 
the  spurious  pressure  oscillations  (similar  to  Rhie-Chow  in  the  FVM),  bypassing  the 
Babuska-Brezzi  condition. 

Until  recently,  the  compatible  Galerkin  formulation  has  been  the  method  of  choice  to 
eliminate  the  spurious  pressure  oscillations.  Unfortunately,  this  technique  has  some  im¬ 
plementation  drawbacks,  including  considerably  more  complex  data  structures.  A  typical 
compatible  element,  the  P2/P1  element,  is  shown  in  Figure  1.3.  In  this  triangle  element,  the 
velocity  interpolation  is  quadratic,  while  the  pressure  interpolation  is  linear. 

To  avoid  the  implementation  drawbacks  of  compatible  elements,  the  alternative  pres¬ 
sure-stabilization  approach  has  gained  popularity  in  recent  years.  This  latter  approach  is  far 
more  convenient  from  an  implementation  standpoint.  As  with  all  stabilization  techniques, 
though,  it  is  often  difficult  to  determine  the  precise  amount  of  stabilization  that  will  eliminate 
the  spurious  oscillations  but  not  spoil  the  accuracy  of  the  solution. 

In  this  work  a  Pressure-Stabilized  Petrov/ Galerkin  (PSPG)  formulation  is  used  with 
Pl/Pl  (equal-order  linear  triangle)  elements.  This  formulation  is  a  full  Petrov-Galerkin 
formulation  in  which  a  perturbed  weight  function  is  used  to  weight  the  continuity  equation. 
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Fig.  1.3  -  The  P2/P1  element:  a  triangle  element  using  quadratic  interpolation  for  the  ve¬ 
locity  DOFs  and  linear  interpolation  for  the  pressure  DOFs. 


The  Babiiska-Brezzi  condition  is  circumvented  and  equal-order  interpolations  are  allowed. 
Not  only  can  the  PSPG  method  be  interpreted  as  a  Petrov-Galerkin  formulation,  but  also 
as  a  generalized  Galerkin  method  in  which  a  stabilization  term  containing  the  residual  of 
the  momentum  equation  is  added  to  the  usual  formulation.  This  formulation  is  consistent 
because  as  the  solution  converges,  the  stabilization  term  tends  to  zero. 

For  general  information  about  the  FEM  given  from  a  CFD  point  of  view,  the  reader  is 
referred  to  (Gresho  and  Sani,  2000).  The  interested  reader  is  referred  to  Chapter  2  of  this 
work,  the  thesis  of  DeMulder  (DeMulder,  1997),  or  the  article  by  Tezduyar  (Tezduyar  et  ah, 
1992)  for  more  detailed  information  on  stabilized  FE  formulations. 

1.1.4  Spectral  Methods 

Spectral  methods  approximate  the  unknown  solution  by  means  of  a  truncated  Fourier 
series  or  a  series  of  Chebyshev  polynomials.  The  major  difference  between  spectral  discretiza¬ 
tions  and  the  FDM,  FVM,  or  FEM  is  that  the  approximations  are  not  local  (compact),  but 
are  valid  throughout  the  entire  computational  domain.  As  with  the  other  methods,  the  un¬ 
knowns  in  the  governing  equation  are  replaced  by  the  truncated  series  approximation.  The 
system  of  algebraic  equations  are  then  developed  by  means  of  a  weighted  residual  approach 
(as  discussed  previously)  or,  alternatively,  by  forcing  the  approximate  function  to  coincide 
with  the  exact  solution  at  a  number  of  grid  points  (termed  a  collocation  approach). 

Spectral  methods  have  the  advantage  of  being  highly  accurate  for  relatively  few  grid 
points.  However,  the  drawbacks  include  increased  computational  cost  and  difficulty  in  ap¬ 
plying  boundary  conditions  for  complex  computational  geometries.  For  more  information  on 
the  use  of  spectral  methods  in  CFD,  the  reader  is  referred  to  (Canuto  et  al.,  1988;  Hussaini 
and  Zang,  1987). 
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1.1.5  Large-Eddy  Simulations 

Virtually  all  flows  of  engineering  interest,  whether  in  aerodynamics  or  industrial  appli¬ 
cations,  are  turbulent.  Turbulence  is  a  random  eddying  motion  existing  in  high  Reynolds 
number  flows  and  containing  a  wide  range  of  eddy  sizes  and  fluctuation  frequencies.  The 
largest  eddies  have  sizes  on  the  same  order  of  magnitude  as  the  flow  domain,  low  frequencies, 
and  are  highly  affected  by  the  flow  boundaries.  The  smallest  eddies,  on  the  other  hand,  are 
on  the  order  of  the  Kolmogorov  length  scale  and  have  very  high  fluctuation  frequencies.  As 
the  Reynolds  number  of  the  flow  is  increased,  the  difference  between  the  largest  and  smallest 
eddies  increases  because  the  Kolmogorov  length  scale  decreases. 

Three  principle  techniques  for  handling  turbulence  in  numerical  approximations  to  the 
Navier-Stokes  equations  exist.  Direct  numerical  simulation  (DNS),  which  involves  numeri¬ 
cally  solving  the  full  unsteady  Navier-Stokes  equations,  is  currently  limited  to  low  Reynolds 
numbers  and  only  the  simplest  flow  geometries.  This  is  because  resolving  all  the  length  and 
time  scales  of  turbulence  requires  computational  resources  that  are  prohibitively  expensive 
(or  often  non-existent).  This  is  illustrated  by  Emmons  (Emmons,  1970),  who  shows  that  pre¬ 
diction  of  the  fine  details  of  turbulent  flow  in  a  pipe  at  Reynolds  number  ReD  =  107  would 
require  1022  operations.  Assuming  a  ‘teraflop’  supercomputer  (10“12  seconds  per  operation) 
was  at  your  disposal  and  had  sufficient  memory,  the  computation  would  still  take  nearly  320 
years! 

Alternatively,  the  Reynolds- averaged  Navier-Stokes  (RANS)  equations,  obtained  from 
time-averaging  the  unsteady  Navier-Stokes  equations,  require  much  less  in  terms  of  compu¬ 
tational  resources  and  are  used  successfully  to  compute  many  flows  of  practical  importance. 
Unfortunately,  the  turbulence  models  used  in  conjunction  with  the  RANS  equations  are  not 
applicable  to  a  wide  range  of  flow  geometries  and  are  inadequate  for  many  turbulent  flow 
situations.  The  major  deficiency  with  the  RANS  equations  is  that  all  turbulence  fluctuations 
are  averaged,  so  the  models  must  account  for  all  scales  of  turbulence.  Researchers  have  been 
developing  turbulence  models  since  the  early  1970’s,  but  to  date  it  has  proved  impossible  to 
develop  a  universal  turbulence  model  applicable  to  all  flow  situations. 

LES  is  a  compromise  between  DNS  and  RANS.  LES  relies  on  the  fact  that  the  small 
turbulent  scales  are  nearly  isotropic  and  independent  of  the  geometry,  whereas  the  large 
turbulent  scales  are  mostly  anisotropic  and  vary  from  flow  to  flow.  The  small  scale  motion, 
which  is  primarily  an  energy  dissipation  phenomenon  and  therefore  more  universal,  is  filtered 
out  of  the  governing  equations  and  modeled  with  a  sub-grid  scale  (SGS)  model.  The  large- 
scale  motion,  on  the  other  hand,  is  computed  directly  by  numerically  solving  the  three- 
dimensional  (3D),  unsteady,  filtered  Navier-Stokes  equations. 

The  scales  of  wavelength  smaller  than  the  mesh  spacing  Ax  are  removed  by  applying  a 
low-pass  filter  function  G  to  the  Navier-Stokes  equations.  The  filtered  variable  corresponding 
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(1-3) 


to  some  quantity  f,  denoted  as  /,  is  defined  as 

/(£,*)  =  J  f  (f,  t)  G(x  —  r)  dr. 

After  applying  the  above  filtering  to  the  Navier-Stokes  and  continuity  equations,  one 
arrives  at  the  following  set  of  governing  equations,  written  in  tensor  notation 


where  the  subgrid-scale  Reynolds  stress  tensor  is  given  by 

Tij  UiUj  UiUj.  (1-6) 

The  main  difficulty  in  LES  is  how  to  model  this  term.  Most  SGS  models  make  a  Boussinesq 
approximation,  in  which  the  SGS  Reynolds  stress  is  assumed  proportional  to  the  super-grid, 
or  resolved,  strain.  The  resulting  models  are  termed  eddy-viscosity  models. 

Smagorinsky’s  model,  proposed  in  1963  (Smagorinski,  1963),  is  by  far  the  most  widely 
used.  In  this  model,  a  mixing-length  assumption  is  made  in  which  the  eddy  viscosity  is 
proportional  to  the  characteristic  length  scale  Ax  and  a  characteristic  turbulent  velocity.  This 
model  has  had  problems  in  reproducing  experimental  results  for  some  basic  flows,  including 
boundary  layer  transition  (Piomelli,  1994)  and  flow  past  a  backward-facing  step  (Lesieur 
and  Metais,  1996).  These  problems  are  attributed  to  the  model’s  overly-diffusive  behavior, 
especially  close  to  solid  walls. 

More  recent  SGS  models  have  been  developed,  such  as  Kraichnan’s  spectral  eddy  viscos¬ 
ity  model  (Kraichnan,  1976),  the  structure-function  (SF)  model  (Metais  and  Lesieur,  1992), 
and  a  class  of  models  termed  dynamic  models.  The  strengths  and  weaknesses  of  each  of  these 
models  will  not  be  discussed  here,  but  rather  in  Section  6.3.2.  If  interested  in  the  current 
state  of  SGS  models,  the  reader  is  referred  to  the  excellent  review  article  by  Lesieur  and 
Metais  (Lesieur  and  Metais,  1996). 

1.2  Objectives 

Though  less  computationally  intensive  than  DNS,  LES  calculations  are  nonetheless  la¬ 
borious  and  very  much  limited  by  current  computer  hardware.  There  are  a  number  of  charac¬ 
teristics  of  LES  that  add  to  this  high  computational  cost,  all  of  which  arise  from  the  necessity 
to  calculate  some  turbulent  scales.  LES 

•  requires  very  fine  spatial  discretizations, 
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•  is  inherently  unsteady  and  requires  very  fine  temporal  discretizations, 

•  and  is  inherently  3D. 

Owing  to  the  last  characteristic  listed  above,  there  is  no  true  ‘2D  LES.’  However,  one  would 
hope  that  simplifications  exist  and  can  be  exploited  for  LES  computations  utilizing  2D  ge¬ 
ometries. 

If  a  2D  geometry  and  periodic  flow  is  assumed  in  the  transverse  (z)  direction,  a  spec¬ 
tral  discretization  is  well-suited  in  that  direction.  In  order  to  accommodate  complex  2D 
geometries,  however,  a  FE  discretization  is  desired  in  the  2D  plane.  A  combined  FE/spectral 
discretization  method  is  ideal.  As  will  be  illustrated  later,  this  formulation  results  in  several 
benefits,  the  most  significant  of  which  are  listed  below. 

•  Using  an  unstructured  triangle  mesh  in  the  2D  plane  allows  easy  accommodation  of 
complex  2D  geometries. 

•  Storage  requirements  are  reduced  because  only  a  single  2D  mesh  need  be  stored. 

•  Computation  and  storage  requirements  are  reduced  because  element  areas  and  normal 
vectors  need  only  be  calculated  and  stored  for  a  single  2D  mesh. 

•  If  an  explicit  treatment  is  used  for  the  convective  terms,  the  Fourier  modes  are  decoupled 
within  each  time  step  and  the  problem  is  essentially  converted  to  solving  a  number  of 
independent  2D  problems  at  each  time  step.  This  situation  requires  less  computation 
than  solving  the  3D  problem  and  also  lends  itself  nicely  to  a  parallel  implementation. 

•  Because  the  flow  variables  in  physical  space  are  known  to  be  real  quantities,  symmetry 
exists  in  the  Fourier  modes.  Consequently,  matrix  equations  for  only  Kj 2  +  1  Fourier 
modes  must  be  solved  (where  K  is  the  total  number  of  modes)  and  the  other  modes 
can  be  quickly  evaluated  from  the  symmetry  condition. 

The  main  objective  of  this  work  is  the  development  and  implementation  of  a  FE/spec¬ 
tral  LES  solver.  By  exploiting  the  advantages  listed  above,  computational  costs  are  reduced 
sufficiently  to  eventually  allow  LES  calculations  on  very  complex  2D  geometries  such  as 
multi-element  high-lift  airfoils,  as  illustrated  in  Figure  1.4. 

Three  sequential  tasks  were  defined  in  order  to  achieve  the  main  objective. 

•  Development  of  a  2D  FE  incompressible  Navier-Stokes  solver  (SFE2D). 

•  Extension  of  the  2D  FE  solver  to  a  3D  combined  FE/spectral  solver  (SFE3D). 

•  Addition  of  a  subgrid-scale  model,  yielding  a  3D  FE/spectral  LES  solver  (SFELES). 


Fig.  1.4  -  A  possible  future  application  of  the  3D  finite-element/spectral  LES  algorithm  is 
that  of  high-lift  multi-element  airfoils. 


Because  this  division  of  tasks  had  a  profound  effect  on  the  development  of  the  final 
algorithm,  the  structure  of  this  dissertation  also  reflects  this  division. 

1.2.1  2D  Solver  (SFE2D) 

The  first  task  was  the  development  and  implementation  of  a  2D  FE  incompressible 
Navier-Stokes  solution  algorithm.  The  solver,  named  SFE2D,  was  developed  from  the  ground 
up  and  written  in  Fortran90. 

The  space  discretization  is  formulated  using  a  Galerkin  FE  discretization  with  Pl/Pl 
elements.  The  pressure  and  convective  instabilities  associated  with  the  Galerkin  FEM  are  re¬ 
moved  via  a  SUPG/PSPG  formulation.  This  approach  (SUPG/PSPG  with  Pl/Pl  elements) 
was  chosen  because  of  the  relative  ease  of  implementation  resulting  from  a  simple  data  struc¬ 
ture  as  well  as  the  ability  to  evaluate  all  coefficients  in  the  matrix  equation  analytically. 

This  spatial  discretization  results  in  second-order  spatial  accuracy.  The  SFE2D  solver 
is  also  second-order  accurate  in  time.  This  is  accomplished  by  means  of  a  consistent  mass 
matrix  and  Crank-Nicolson  treatment  of  the  pressure  and  diffusion  terms.  The  convective 
terms  are  handled  in  an  explicit  manner— a  second-order  Adams-Bashforth  method  is  utilized 
for  these  terms. 

Solution  of  the  algebraic  system  is  performed  using  the  SPARSKIT  tool  kit  developed 
by  Youcef  Saad,  University  of  Minnesota  (Saad,  1994b).  This  library,  written  in  Fortran77, 
was  chosen  because  of  its  available  compact  matrix  storage  formats,  efficient  implementation 
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of  a  number  of  solution  algorithms,  and  the  inclusion  of  many  preconditioning  algorithms. 

1.2.2  3D  Laminar  Solver  (SFE3D) 

The  3D  laminar  solver,  referred  to  as  SFE3D,  was  developed  by  extending  the  SFE2D 
solver.  The  in-plane  and  temporal  discretization  methods  are  identical  to  those  in  the  SFE2D 
solver.  The  transverse  (z)  direction  discretization  uses  a  spectral  method. 

This  solver  takes  advantage  of  the  benefits  listed  previously.  Grid  overhead  costs  are 
reduced  because  the  solver  uses  only  a  2D  mesh.  The  use  of  the  explicit  Adams-Bashforth 
method  for  the  convective  terms  decouples  the  matrix  equations  for  the  Fourier  modes,  reduc¬ 
ing  the  computation  cost  and  allowing  efficient  parallelization.  Finally,  the  Fourier  transform 
symmetry  characteristics  are  exploited  to  reduce  the  computations. 

The  algorithm  is  parallelized  in  three  main  areas.  The  first  is  in  the  transformation 
in  the  transverse  direction  between  real  space  and  Fourier  space.  Each  set  of  nodes  with 
the  same  in-plane  coordinates  is  transformed  independently  of  all  other  sets;  therefore  this 
transformation  is  carried  out  in  parallel.  The  second  area  is  in  the  solution  of  the  independent 
linear  systems  for  each  Fourier  mode  within  each  time  step.  The  third  is  in  the  evaluation  of 
the  convective  terms,  which  is  done  on  a  plane-by-plane  basis  is  physical  space. 

The  major  limitation  imposed  by  parallelizing  in  this  manner  is  that  it  must  be  imple¬ 
mented  on  a  shared-memory  computer.  The  reason  being  that  for  the  parallelization  of  the 
linear  system  solutions,  the  domain  is  partitioned  in  the  transverse  direction.  However,  once 
the  N  independent  solutions  are  obtained,  information  from  all  transverse  planes  is  necessary 
to  evaluate  the  convective  terms.  If  this  were  to  be  implemented  on  a  distributed  memory 
system,  the  communication  between  processors  would  be  prohibitively  expensive. 

The  parallelization  is  implemented  using  the  OpenMP  standard.  OpenMP  is  an  emerging 
standard  for  shared-memory  parallelism  with  considerable  industry  support  from  such  large 
corporations  as  IBM,  Intel,  and  Silicon  Graphics.  This  standard  allows  for  two  basic  ‘flavors’ 
of  parallelization:  segmentation  of  the  code  into  different  sections  to  be  run  as  different 
threads,  or  execution  of  the  iterations  of  a  DO  loop  in  parallel.  The  parallelization  scheme 
implemented  in  SFES3D  is  a  rather  straightforward  application  of  the  DO  loop  flavor. 

1.2.3  LES  Solver  (SFELES) 

The  LES  solver,  referred  to  as  SFELES,  is  a  direct  extension  of  the  SFE3D  solver. 
The  in-plane,  transverse,  and  temporal  discretizations  are  identical  to  those  discussed  in  the 
previous  sections. 

The  turbulence  effects  are  modeled  using  the  Smagorinski  SGS  model.  These  terms 
are  evaluated  using  finite-elements  in  the  2D  plane  and  finite-differences  in  the  transverse 


11 


direction.  Temporally,  a  second-order  explicit  Adams-Bashforth  method  is  employed. 

Wall  functions  are  not  implemented  into  the  SFELES  solver.  Instead,  the  mesh  must 
be  sufficiently  refined  to  resolve  the  near-wall  gradients.  Certainly,  then,  this  solver  cannot 
be  used  at  very  high  Reynolds  numbers  because  as  the  Reynolds  number  is  increased,  an 
increasing  number  of  mesh  points  must  be  employed. 


1.3  Related  Works 

The  idea  of  employing  different  discretization  schemes  for  different  coordinate  directions 
is  not  new.  There  are,  in  fact,  numerous  examples  of  mixed  FD/spectral  methods  published 
in  scientific  literature.  The  motivating  idea  behind  this  approach  is  to  use  spectral  methods 
in  coordinate  directions  where  periodicity  exists,  while  using  a  FDM  in  coordinate  directions 
where  more  complex  boundary  conditions  must  be  applied. 

Wray  and  Hussaini  (Wray  and  Hussaini,  1984)  used  this  technique  to  perform  calcu¬ 
lations  of  parallel  boundary-layer  transitions.  They  used  a  Fourier  spectral  method  in  two 
periodic  directions  (streamwise  and  transverse)  and  second-order  finite-differences  in  the  nor¬ 
mal  direction.  A  slightly  different  spectral/finite-difference  method  was  used  by  Moin  and 
Kim  (Moin  and  Kim,  1982)  for  LES  of  turbulent  channel  flow  and  by  Biringen  (Biringen, 
1985)  in  a  study  of  active  control  in  channel  flows.  Additionally,  Eidsen,  Hussaini,  and  Zang 
(Eidsen  et  ah,  1986)  used  a  similar  algorithm  in  a  high-resolution  DNS  study  of  a  turbulent 
Rayleigh-Benard  flow. 

Researchers  in  meteorology  have  been  using  so-called  MSFD  (Mixed  Spectral  Finite- 
Difference)  algorithms  for  over  a  decade.  Example  studies  include  Beljaars  et  al  (Beljaars 
et  al.,  1987),  Ayotteet  al  (Ayotte  et  al.,  1994),  and  Ayotte  and  Taylor  (Ayotte  and  Taylor, 
1995).  In  these  cases,  the  horizontal  coordinate  directions  are  discretized  using  a  spectral 
method,  while  the  vertical  coordinate  is  discretized  using  finite-differences. 

The  above-mentioned  studies  used  Cartesian  coordinate  systems.  Lu  et  al  (Lu  et  al., 
1997)  employed  a  mixed  FD/spectral  method  with  cylindrical  coordinates  to  study  oscillating 
flows  around  a  circular  cylinder  using  LES.  In  this  case,  the  circumferential  and  axial  coordi¬ 
nate  directions  were  discretized  using  a  spectral  method  while  a  FDM  was  used  in  the  radial 
direction.  Yuan  and  Prosperetti  (Yuan  and  Prosperetti,  1994)  used  a  mixed  FD/spectral 
method  with  a  bispherical  coordinate  system  to  study  the  motion  of  two  equal  spherical 
bubbles  moving  along  their  line  of  centers. 

LES  solvers  employing  unstructured  meshes,  particularly  the  FEM,  are  rather  uncom¬ 
mon.  The  reason  being  that  the  computational  cost  of  LES  is  very  high,  and  very  fast 
algorithms  typically  employing  Cartesian,  cylindrical,  or  spherical  meshes  are  preferred.  Un¬ 
structured  meshes  allow  the  possibility  of  simulating  flows  with  complex  geometries,  however, 
and  are  beginning  to  appear.  Haworth  and  Jansen  (Haworth  and  Jansen,  2000)  developed  an 
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unstructured  LES  algorithm  based  on  the  FVM.  Their  solver,  which  employed  a  number  of 
variations  of  the  Smagorinski  SGS  model,  was  used  with  some  success  in  simulating  flow  in 
a  piston/valve  system  using  deforming  meshes.  Miura  et  al  (Miura  et  al.,  1999)  employed  a 
Petrov-Galerkin  FEM  with  linear  hexahedral  elements  and  the  Smagorinski  model  in  solving 
turbulent  flow  past  a  square  cylinder.  Mittal  (Mitall,  2001)  developed  an  LES  algorithm 
based  on  a  SUPG/PSPG  FE  formulation  with  an  additional  least-squares  convective  stabi¬ 
lization.  Using  linear  hexahedral  elements,  he  successfully  applied  this  solver  to  flow  past  a 
low  aspect  ratio  cylinders  in  a  channel  with  sidewalls.  Finally,  Jansen  (Jansen,  1999)  used  a 
stabilized  FE  formulation  with  linear  tetrahedral  elements  and  a  dynamic  SGS  model.  The 
goal  of  his  work  was  to  simulate  flow  past  a  finite-span  wing  at  Re  =  1.64  x  106,  but  only 
partial  success  was  realized.  The  grid-refined  simulations  would  have  required  on  the  order 
of  70  full  days  to  complete  using  all  512  processors  of  the  CM5  supercomputer  at  the  Army 
High  Performance  Computing  Research  Center,  so  the  study  was  not  fully  completed. 

1.4  Contributions 

The  SFELES  solver,  produced  as  part  of  this  work,  is  a  LES  solver  for  2D  geometries 
that  utilizes  a  combined  FE/spectral  discretization.  Specifically,  the  in-plane  discretization 
is  a  stabilized  FEM  with  linear  triangle  elements,  while  the  transverse  discretization  uses 
a  collocated  pseudo-spectral  approach.  The  unique  characteristics  of  this  solver  are  now 
summarized. 

•  Though  stabilized  FE  solvers  have  been  around  for  some  time,  to  the  author’s  knowl¬ 
edge,  this  is  the  first  time  a  stabilized  FEM  has  been  coupled  with  a  spectral  method. 

•  SFELES  is  a  LES  solver  designed  to  exploit  the  simplifications  accompanying  a  2D 
geometry.  Although  the  flowfield  is  3D,  only  a  2D  mesh  must  be  generated  and  stored. 
Similarly,  the  overhead  associated  with  the  mesh,  including  element  areas,  normals,  and 
linkages,  are  only  computed  for  the  2D  mesh.  Compared  to  a  typical  3D  FE  solver,  a 
decrease  in  computational  cost  is  achieved  because  the  3D  problem  is  converted  into  a 
set  of  independent  2D  problems  in  Fourier  space  via  careful  treatment  of  the  nonlinear 
terms. 

•  The  use  of  unstructured  meshes  with  LES  is  rather  new  due  to  the  associated  increase  in 
computational  cost.  Exploiting  the  simplifications  accompanying  2D  geometries  helps 
compensate  for  this  cost,  and  the  unstructured  mesh  allows  for  very  complex  geometries 
in  the  2D  plane. 

•  The  solver  is  parallelized  using  a  novel  approach  that  partitions  the  problem  in  Fourier 
space  rather  than  physical  space.  This  results  in  no  partitioning  overhead  in  terms  of 
computation  cost,  and  no  convergence  or  accuracy  problems  related  to  partitioning. 
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Also,  there  is  absolutely  no  communication  between  threads  during  the  solution  of  the 
linear  systems  generated  at  each  time  step. 

In  addition  to  SFELES,  two  intermediate  solvers  have  also  been  developed.  SFE2D  is  a 
2D  unsteady,  laminar  NS  solver  based  on  a  stabilized  FEM  and  using  linear  triangle  elements. 
The  SFE3D  solver  is  a  3D  unsteady,  laminar  NS  solver  based  on  a  stabilized  FE/spectral 
discretization. 

The  algorithm  developed  and  implemented  in  this  work  has  been  applied  to  flow  past 
a  circular  cylinder  with  a  splitter  plate  in  the  wake  region.  This  study  has  resulted  in 
contributions  to  the  understanding  of  the  physical  mechanisms  involved  in  vortex  formation 
in  the  presence  of  splitter  plates  as  well  as  the  behavior  of  shedding  frequency  and  drag  with 
varying  splitter  plate  lengths. 
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2.  2D  LAMINAR  FLOW  SOLVER 

The  purpose  of  this  chapter  is  to  explain  the  development  of  SFE2D,  a  2D,  unsteady, 
laminar  NS  solver  utilizing  a  SUPG/PSPG  FEM. 

First,  the  spatial  discretization  is  presented.  The  Galerkin  formulation  of  the  Stokes 
equations  is  presented  in  some  detail  and  applied  to  the  case  of  lid-driven  cavity  flow  in  order 
to  illustrate  its  stability  problems.  To  resolve  these  problems,  the  PSPG  formulation  of  the 
Stokes  equations  is  developed  and  applied  to  the  same  lid-driven  cavity  test  case. 

The  addition  of  the  convective  terms  to  the  Stokes  equations  results  in  the  steady  NS 
equations.  These  terms  add  a  new  instability  to  the  FEM  formulation,  placing  a  restriction 
on  the  allowable  cell  Peclet  number.  To  illustrate  this  stability  problem,  the  PSPG  algorithm 
is  applied  to  flow  past  a  rectangular  bump  in  a  channel.  To  remedy  this  stability  issue,  the 
SUPG/PSPG  formulation  of  the  steady  NS  equations  is  developed  and  applied  to  the  same 
test  case. 

Next,  the  temporal  discretization  scheme  is  presented.  The  diffusion  and  pressure  gradi¬ 
ent  terms  in  the  momentum  equations  are  discretized  using  a  Crank-Nicolson  scheme,  while 
the  convection  terms  use  an  explicit  Adams-Bashforth  scheme.  The  terms  added  to  the  linear 
system  are  derived  and  presented. 

Finally,  an  overview  of  SFE2D  is  given,  including  some  detail  regarding  implementation. 

2.1  Galerkin  Finite-Element  Method 

A  presentation  of  the  basic  components  of  a  Galerkin  FEM  is  given  in  this  section.  The 
complete  process  to  obtain  a  set  of  linear  equations  for  the  unknowns  at  the  nodes  is  not 
shown  here,  but  rather  saved  until  the  next  section  where  it  is  demonstrated  for  the  Stokes 
equations. 

2.1.1  Continuum  Problem 

Consider  a  ID  linear  boundary  value  problem  (BVP)  consisting  of  the  differential  equa¬ 
tion 

L  (u(x))  —  f(x )  for  x  €  (2.1) 


and  boundary  conditions 


u  =  u  on  Tdu 


on  rnu 


(2.2) 
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where  ft  is  the  domain  on  which  (2.1)  is  valid  and  T  is  the  boundary  of  ft.  Note  that  T  is  split 
into  two  parts,  a  part  containing  Dirichlet  boundary  conditions  T^u  and  a  part  containing 
Neumann  boundary  conditions  Tnu. 

2.1.2  Weighted  Residual  Formulation 

The  first  step  in  arriving  at  a  Galerkin  FEM  solution  is  to  recast  the  continuum  problem 
(2.1)  into  its  weak  form,  also  referred  to  as  its  variational  form.  Though  the  continuum  for¬ 
mulation  of  the  BVP ,  (2.1,  2.2),  is  generally  unique,  there  exists  no  unique  weak  formulation 
of  the  same  problem.  Though  alternate  weak  formulations  may  exist  for  the  BVP,  they  are 
actually  equivalent  as  long  as  a  solution  to  the  BVP  exists  and  is  sufficiently  smooth.  As 
Gresho  (Gresho  and  Sani,  2000)  states,  “Some  weak  formulations. . .  are  more  useful  than  oth¬ 
ers  because  (at  least)  they  more  efficiently  and  more  ‘naturally’  take  account  of  the  [boundary 
conditions].” 

A  finite-dimensional  space  Hh  to  which  the  approximate  solution  will  belong  is  defined. 
Then,  an  approximate  solution  to  the  BVP  (2.1,  2.2)  having  the  form 

n 

uh(x)  =  J2ujNj(x)  (2.3) 

3= 1 

is  searched  for,  where  Uj  are  the  so-called  degrees  of  freedom,  the  unknowns  of  the  discrete 
problem.  The  basis  functions  Nj  are  finite-order  polynomials  that  make  up  Hh.  The  dimen¬ 
sion  of  the  function  space  Hh  is  finite,  Hh  =  {Nj  :  j  =  1, 2, . . . ,  n},  therefore  (2.3)  cannot  in 
general  satisfy  (2.1)  at  all  points  in  the  domain  ft.  The  basis  functions,  however,  are  chosen 
such  that  they  form  a  complete  set.  That  is,  as  n  grows  the  approximation  obtained  by  (2.3) 
improves. 

Because  uh  cannot  satisfy  (2.1)  exactly,  if  (2.3)  is  substituted  into  (2.1)  a  residual  remains: 

fQ  =  L  ( uh )  -  /  in  ft.  (2.4) 

A  natural  approach  to  arriving  at  an  approximate  solution  to  the  BVP  is  to  ensure  that  this 
residual  is  somehow  minimized.  This  can  be  done  in  a  number  of  ways. 

One  approach  would  be  to  require  that  the  square  of  the  residual  be  a  minimum  over 
the  entire  domain  ft.  That  is,  minimize 

[  fyn)2  dft.  (2.5) 

j  n 

This  is  known  as  a  least  squares  approach. 

Another  approach  common  to  the  FEM  is  the  method  of  weighted  residuals.  In  this 
formulation,  the  residual  multiplied  by  some  weighting  function  is  minimized  over  ft.  That 
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(2.6) 


is,  minimize 


/ 

Jo 


w{Tq  dQ. 


The  set  of  weighting  functions  W  =  {wi  :  i  —  1, 2, ...  n}  must  form  a  complete  set  and  be 
the  same  in  number  as  the  basis  functions  Nj. 


2.1.3  Galerkin  Formulation 

The  Galerkin  formulation  is  one  of  the  many  possible  choices  for  the  weighting  functions. 
Though  the  Galerkin  formulation  is  used  in  this  work,  two  other  choices  for  weighting  func¬ 
tions  are  briefly  mentioned  to  illustrate  the  generality  of  the  FEM  and  its  closeness  to  both 
the  FDM  and  FVM. 

In  the  collocation  method,  the  weight  functions  are  Dirac-delta  functions  in  n  points  in 
the  domain: 

Wi  =  5(x  —  Xi);  i  =  l,2,...n.  (2.7) 

This  essentially  ensures  that  the  residual  be  equal  to  zero  at  a  number  of  points  in  the  domain, 
and  has  much  in  common  with  the  philosophy  of  the  FDM. 

Another  choice  of  weighting  functions,  termed  the  subdomain  collocation  method,  is  to 
use  step- discontinuous  functions  of  the  form 


Wi  = 


for  Xi  <  x  <  xi+i, 
otherwise. 


(2.8) 


This  formulation  ensures  that  the  integral  of  the  residual  be  zero  on  n  subdomains,  or  el¬ 
ements.  The  FVM,  which  also  utilizes  an  integral  form  of  the  continuum  equation  (rather 
than  a  differential  form),  is  a  particular  case  of  this  method. 

Because  the  requirements  for  the  weighting  functions  are  that  they  form  a  complete  set 
and  be  in  number  equal  to  the  basis  functions,  an  obvious  choice  is  to  use  the  basis  functions 
themselves  as  the  weighting  functions: 


Wi  =  Ni.  (2.9) 

This  formulation,  which  is  the  most  common  in  the  FEM,  is  termed  the  Galerkin  method. 
This  method  has  the  advantage  of  simplicity  in  that  only  one  set  of  functions  is  used.  Also, 
using  integration  by  parts  on  the  differential  operator  allows  the  use  of  lower  order  basis  func¬ 
tions.  For  example,  second-order  equations  can  be  approximated  using  linear  basis  functions. 
Drawbacks  include  the  need  to  evaluate  possibly  complex  integrals  and  some  stability  issues 
for  convection  dominated  flows. 


17 


2.1.4  Finite  Elements 

As  will  be  shown,  the  FEM  shares  many  successful  concepts  with  the  FDM.  One  is  that 
the  formulation  is  said  to  have  compact  support— the  discretized  equations  couple  only  a  few 
neighboring  point  values.  In  addition,  the  parameters  of  the  approximate  representation,  the 
Vs  in  (2-3)>  are  point  values  (as  opposed  to  subdomain  averages  as  in  the  FVM). 

The  computational  domain  Cl  is  divided  into  a  set  of  non-overlapping  subdomains  Cle.  In 
one  dimension,  an  obvious  choice  for  splitting  up  the  domain  would  be  as  shown  in  Figure  2.1, 
where  the  subdomains  extend  between  two  adjacent  points.  In  this  figure  two  sample  subdo¬ 
mains  are  highlighted:  subdomain  ACle,  the  interval  <x<  xj}  and  subdomain  BCle,  the 
interval  Xj  <  x  <  Xj+\.  In  two  dimensions  the  domain  is  divided  into  simple  polygons  (tri¬ 
angles,  quadrilaterals,  etc.)  as  depicted  in  Figure  1.1.  Correspondingly,  in  three  dimensions 
the  domain  is  divided  into  simple  polyhedra. 


Fig.  2.1  -  One-dimensional  elements. 


A  set  of  nodes  is  associated  to  each  subdomain,  which  typically  includes  its  vertices.  In 
addition,  nodes  may  be  located  in  the  interior  or  along  the  subdomain  faces.  Together  with 
its  set  of  nodes,  the  subdomain  is  termed  an  element. 

Each  node  i  is  assigned  a  basis  function  N{(x)  in  ID  or  Ni(x,y )  in  2D,  which  is  defined 
locally  within  the  element.  The  basis  functions  are  local  interpolation  functions  within  each 
element.  Therefore,  the  basis  function  associated  to  node  i  must  vanish  at  all  other  nodes  of 
the  element  Cle.  In  other  words,  for  all  nodes  j  in  Cle, 

Ni(xj)  =  6ij  (2.10) 

in  ID,  or  similarly  in  2D, 

Ni((x,y)j)  =  6i:j  (2.11) 

where  S  is  the  Kronecker  delta.  As  a  result  of  this  property,  a  degree  of  freedom,  u3  in  (2.3), 
is  the  value  of  the  approximate  function  uh  at  node  j,  or 

uj  =  uh(xj).  (2.12) 
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Again,  the  basis  functions  are  defined  locally  on  each  element.  The  global  basis  function 
associated  to  a  node  j  is  simply  the  combination  of  the  basis  function  on  each  element  to 
which  node  j  belongs.  The  global  basis  function  associated  to  node  j  is  identically  zero  on 
all  elements  that  do  not  contain  the  node,  thus  providing  compact  support. 

In  this  work,  linear  basis  functions  are  utilized.  Therefore,  higher  order  functions  will 
not  be  discussed.  Figure  2.2  shows  ID  linear  elements  and  the  basis  function  associated  with 
node  j.  It  is  seen  that  the  global  basis  function  Nj  is  the  combination  of  the  two  local  basis 
functions  ANj  and  BNj,  defined  on  elements  A  and  B  respectively. 


Fig.  2.2  -  One-dimensional,  piecewise  linear  basis  function  Nj  belonging  to  node  j,  . 

This  work  utilizes  2D  linear  triangle  elements.  These  elements  are  given  the  name  PI, 
where  P  refers  to  the  triangle  shape  and  1  to  the  order  of  the  interpolation  polynomials.  The 
local  basis  functions  associated  with  each  of  the  three  nodes  of  an  element  are  depicted  in 
Figure  1.2.  The  resulting  tent-shaped,  piecewise  linear  global  basis  function  associated  to  a 
node  j  is  illustrated  in  Figure  2.3. 


Fig.  2.3  -  Two-dimensional,  tent-shaped,  piecewise  linear  basis  function  Nj  belonging  to  node 
j  of  a  mesh  of  PI  elements. 
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2.2  Spatial  Discretization 

The  spatial  discretization  is  best  illustrated  assuming  steady  flow.  Since,  as  mentioned 
previously,  the  convection  terms  are  treated  explicitly,  this  section  will  focus  on  the  dis¬ 
cretization  of  the  Stokes  flow  equations,  also  termed  the  creeping  flow  equations.  These  are 
the  limit  of  the  Navier-Stokes  equations  as  Re  — >  0. 

Consider,  then,  creeping  flow  in  the  computational  domain  ft  with  boundary  T  =  rdurn, 
where  Td  is  the  portion  of  the  boundary  subject  to  Dirichlet  boundary  conditions  and  T„  the 
portion  subject  to  Neumann  conditions.  The  flow  is  governed  by  the  momentum  equation 

vV2u-'Vp  =  0  (2.13) 

and  the  continuity  equation 

V-w  =  0.  (2.14) 

In  the  above  equations,  u  is  the  kinematic  viscosity  of  the  fluid,  u  is  the  2D  velocity  vector 
(«>u),  and  P  is  the  kinematic  pressure  (pressure  divided  by  density). 

Equations  (2.13)  and  (2.14)  are  subject  to  Dirichlet  and  Neumann  boundary  conditions: 

u  =  u  on  Td  (2.15) 

(-p[I\  +  uVu)-n  =  0  on  Tn,  (2.16) 

where  u  is  the  imposed  Dirichlet  velocity  boundary  condition,  [I]  is  the  identity  matrix,  n  is 
the  outward  unit  normal  on  T ,  and  0  is  the  imposed  surface  traction  boundary  condition. 

2.2.1  Galerkin  Formulation 

To  obtain  a  FE  formulation,  the  form  of  the  approximate  solution  is  first  prescribed.  We 
will  denote  the  approximate  solution  for  the  velocity  as  (uh,  vh)  and  the  approximate  solution 
for  the  pressure  as  ph.  Adopting  the  form  of  (2.3),  we  define 


uh  =  YluiN^x'y) 

j 

(2.17) 

vh  =  ^2v3Nj(x>y) 

j 

(2.18) 

ph  =  ^2pjNj(x,y) 

(2.19) 

3 


where  the  Nj  are  the  basis  functions  and  Uj ,  Vj,  and  pj  are  the  unknown  values  of  u,  v,  and 
p,  respectively.  In  this  work  a  Pl/Pl  element  is  utilized,  meaning  a  linear  triangle  element 
with  the  DOFs  for  both  the  velocity  and  pressure  located  at  the  vertices  of  the  element,  as 
shown  in  Figure  2.4. 
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Fig.  2.4  -  The  Pl/Pl  element:  a  triangle  element  using  the  same  linear  interpolation  func¬ 
tions  for  the  velocity  and  pressure  DOFs. 


For  purposes  of  illustration,  (2.13)  and  (2.14)  will  be  written  in  expanded  form: 

/ d2u  d2u\  dp  _ 

V  \  dx 2  dy2  J  dx 

f  d2v  d2v\  dp 

v  1  ^  „  T  „  0  —  0 


“V^  =  0 

1 2  J  dx 

(2.20) 

=  o 
/V 

(2.21) 

dv 

dx^  dy 

(2.22) 

A  weighted  residual  formulation  is  obtained  by  weighting  the  momentum  equation  by  vwh  = 
(Vxwh,  Vywh )  and  the  continuity  equation  by  pwh,  resulting  in 


f  v  \  ( d2uh  d2uh\  dph  1 
L‘W,[\  Wj  dx\ 

/.-(£*£)*- 


( duh  dvh\ 

w< (sT  +  )  da  =  ° 


(2.23) 

(2.24) 

(2.25) 


for  equations  (2.20),  (2.21),  and  (2.22),  respectively. 


Employing  the  Galerkin  formulation,  the  weighting  functions  are  identified  to  the  basis 
functions: 


VxWi  =  [iVj,  0] 
VyWi  =  [0,  Ni\ 
pWi  —  Ni. 


(2.26) 

(2.27) 

(2.28) 


Substituting  (2.26)-(2.28),  as  well  as  the  form  of  the  approximate  solution,  (2.17)— (2.19), 
into  (2.20)-(2.22)  gives 


l « ["  (|?  E E  ~  I  E»"i] dfi  = 0 


(2.29) 
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iNi  [u  ^VjNjw ^VjNi)  ~  h ^PjNj  dn = 0  (2-30) 

lN<(il2^  +  ly  £«**,)  *-0.  (231) 


Because  the  basis  (and  weighting)  functions  are  defined  locally  on  each  element,  (2.29)- 
(2.31)  can  be  re-written  as  the  sum  over  the  elements  Qe  of  the  domain  fh 

£  L  Ni  r  I  sp  £ U>N¥  £ -  4:  E pj^I  <«« = o  (2.32) 


dn'=0  (2-32> 

E(*'  "  X>*i)  -  Ty  £*"'  <if!'  =  0  t2'33) 

£  Jn  (fa  =  0-  (2-34) 


Furthermore,  because  the  nodal  values  are  constant,  they  can  be  taken  outside  of  the 
integral.  Performing  this  task,  as  well  as  some  additional  simplification  and  regrouping, 
results  in  the  following: 


??((•{•#*  W)  *)  ■■  -  (i  ".?*) »]  - 
?  ?[('!*  W)' (L ".f 


-(I "'?«)»- 


0  (2.35) 


0  (2.36) 


££[/„  ( 


Nt^-dn, 

ox 


)u‘+(L,n^¥>.  - 


(2.37) 


If  the  integrals  in  (2.35)-(2.37)  are  evaluated,  either  analytically  or  numerically  by  Gauss 
quadriture,  one  obtains  a  linear  system  of  equations  for  the  unknown  values  ujt  Vj,  and  pj  as 
a  function  of  their  neighboring  values. 

Equations  (2.35)  and  (2.36)  pose  a  problem  if  linear  basis  functions  are  used.  The 
Laplace  operator  results  in  the  terms  appearing  in  the  x-  and  y-momentum 

equations.  If  Nj  is  linear,  these  terms  are  identically  zero.  This  problem  can  be  side-stepped 
by  some  mathematical  manipulation  of  the  weighted  residual  form  of  the  equations.  This 
will  be  illustrated  for  the  ^-momentum  equation  only — the  procedure  is  identical  for  the 
y-momentum  equation.  If  (2.23)  is  integrated  by  parts,  one  gets 


IM 


[  V  •  v 
J  n 


Vxon.T7.,h  _  dP 


WiVun  -  -£-v*wi(m- 

OX 


22 


[  uW  v*Wi  •  Vuh  -  ph^^dCl.  (2.38) 

Ja 

The  divergence  theorem  states  that 

/  V  •  FdQ  =  [  F-  ndT,  (2.39) 

J  a  Jr 

where  F  is  some  vector  and  n  is  the  outward  unit  normal  on  the  boundary  F  of  Cl.  If  this 
theorem  is  applied  to  the  first  integral  on  the  right  hand  side  of  (2.38),  one  sees  that  an 
equivalent  form  of  (2.23)  is 

v  VxWi^~  -  ph  Vxw^dT  -  f  vVv*Wi  ■  Vuh  -  ph^p^dCl.  (2.40) 

on  ox  Jsi  ox 

It  is  seen  that  now  all  second  derivatives  have  been  removed  from  the  equation  and  linear 
basis  functions  can  be  used. 

The  boundary  integral  term  in  (2.40)  will  be  dropped  for  the  remainder  of  this  discussion. 
This  can  be  done  if  certain  simple  physical  boundary  conditions  are  imposed. 


•  On  Dirichlet  velocity  boundaries  this  poses  no  problem  in  that  the  term  is  not  used 
because  the  velocity  DOFs  are  prescribed  on  these  boundaries  rather  than  being  calcu¬ 
lated  from  the  momentum  equation.  Boundaries  of  this  type  include  no-slip  walls  and 
fixed- velocity  inlets. 

•  A  natural  outflow  boundary  condition  is  to  impose  duh/dn  —  0.  If  we  also  impose  that 
ph  =  0  at  this  boundary,  the  boundary  integral  term  is  identically  zero  anyway. 


The  final  weighted  residual  FEM  form  of  the  governing  Stokes  equations  (2, 

.13)  and  (2.14)  is 

EE 

e  j 

Y  f  dNidNj  ,  dNidNjj„\  ,/ 

Kfc  dx  +  av  ( 

7  fW 

Ja,  Sx  , 

)pj 

=  0 

(2.41) 

EE 

e  j 

i  w n 

J. fie  dv  j 

1  Pi 

=  0 

(2.42) 

??[£(*>•)'*( 

7  w 

Ja,  °y  J 

)  vi 

=  0. 

(2.43) 

To  set  up  a  matrix  equation  to  solve  the  linear  system,  (2.41)  is  used  for  the  unknown 
u-DOFs,  (2.42)  for  the  u-DOFs,  and  (2.43)  for  the  p-DOFs.  If  the  typical  ordering  of  the 
DOFs — first  all  u-DOFs,  followed  by  all  u-DOFs  and  then  p-DOFs — is  used,  the  resulting 
matrix  equation  has  the  following  form: 


‘  K 

0 

XQ 

0 

K 

yQ 

-XQT 

-^QT 

0 
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(2.44) 


In  this  equation,  U  =  {Uj  :  j  =  1,2, . . .  ,n},  V  =  {Vj  :  j  =  1,2, . . .  ,n},  and  P  =  fa  :  j  = 

1,2, ...  ,n}  are  the  unknown  DOFs,  K  the  viscous  matrix,  y  I”  the  gradient  matrix,  and 

[-XQT  ~  yQT]  the  divergence  matrix.  Note  that  the  system  matrix  is  indefinite,  meaning 
an  iterative  method  or  direct  method  with  pivoting  is  required  to  solve  the  system. 

In  this  work  a  node-by-node  ordering  of  the  DOFs  is  used,  in  which  all  velocity  and 
pressure  unknowns  at  a  node  are  ordered  one  after  the  other  to  form  a  small  three-component 
vector.  This  ordering  results  in  a  block-structured  system  matrix,  a  characteristic  that  can 
be  exploited  in  the  system  solution  algorithm.  We  can  now  speak  in  terms  of  the  small 
characteristic  3x3  matrix  equation  associated  with  each  node.  The  equation  analogous  to 
(2.44)  is 

’  hj  0  xqijl  ( itj'l  O' 

0  kij  yqij  !  Vj  =  0  .  (2.45) 

~xQji  ~vQji  Oj  ( Pj  [o. 

The  terms  in  the  system  matrix,  which  are  summarized  below,  come  from  equations 
(2.41),  (2.42),  and  (2.43). 


h  _  r  dN,  dN,  31 V.  dNj 

Jne  dx  dx  dy  dy  e 

(2.46) 

9,i = L,  ikN‘dn‘ 

(2.47) 

(2.48) 

2. 2. 1.1  Analytical  Coefficient  Evaluation 

Because  linear  triangle  elements  are  used,  each  of  the  coefficients  in  (2.45)  can  be  easily 
evaluated  analytically  from  (2.46)-(2.48).  These  are  given  as 

kij  =  4^  +  VniVn^  (2‘49) 

xQij  =  -J-  (2.50) 

v  Vni 

v%  =  ~  (2.51) 

In  the  above  equations,  St  is  the  element  area,  while  xrii  and  yn%  are  the  x  and  y  components 
respectively  of  the  scaled  inward  element  normal  vector  Hi.  As  illustrated  in  Figure  2.5,  the 
normal  vector  Hi  is  perpendicular  to,  and  scaled  to  the  length  of,  side  i  (where  side  i  is  the 
side  opposite  node  i). 
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Fig.  2.5  -  Definition  of  the  scaled  inward  normal  vectors  for  a  linear  triangle  element. 


2. 2. 1.2  Galerkin  Formulation  Problems 

When  the  Galerkin  FEM  is  applied  to  incompressible  flow,  two  stability  problems  are  en¬ 
countered.  First,  at  high  Reynolds  numbers  (convection-dominated  flow)  oscillations  appear 
in  the  approximate  velocity  solution.  This  phenomenon  is  discussed  further  in  Sections  2. 2. 2.4 
and  2. 2. 2. 5.  In  addition,  spurious  pressure  oscillations  appear  at  all  Reynolds  numbers  due 
to  the  so-called  odd-even  decoupling  phenomenon.  These  instabilities  are  now  discussed. 

2.2. 1.3  Spurious  Pressure  Oscillations 

Galerkin  discretizations  of  the  Stokes  Equations  (and  the  incompressible  Navier-Stokes 
equations,  for  that  matter)  suffer  from  spurious  pressure  oscillations  in  the  resulting  approx¬ 
imate  solution.  These  oscillations  occur  for  certain  combinations  of  velocity  and  pressure 
basis  functions.  For  example,  Taylor  and  Hood  (Taylor  and  Hood,  1973;  Hood  and  Taylor, 
1974)  experienced  this  problem  when  using  finite-elements  with  equal-order  basis  functions, 
ie.  Pl/Pl,  P2/P2,  Ql/Ql,  etc.  No  oscillations  are  seen  for  certain  elements  utilizing  lower 
order  basis  functions  for  pressure  than  velocity,  however,  ie.  P2/P1,  P2/P0. 

These  above-mentioned  pressure  oscillations  are  due  to  the  odd-even  decoupling  phe¬ 
nomenon  and  are  seen  for  collocated  variable  discretizations  in  the  FD  and  FV  methods  as 
well.  This  phenomenon  can  be  easily  demonstrated  by  considering  a  FD  discretization  on  a 
simple  2D  domain.  Hypothetically,  one  can  assume  that  a  ‘checker-board’  pressure  field  has 
somehow  developed,  as  shown  in  Figure  2.6.  If  a  central-difference  approximation  is  used, 
the  pressure  gradient  term  dpfdx  in  the  u- momentum  equation  is  given  by 


(2.53) 


Similarly,  the  pressure  gradient  term  dp/dy  for  the  u-momentum  equation  is 

dp  _  Pi,j+ 1  ~  PiJ- 1 
dy  2A  y 

The  pressure  at  the  central  node  (i,j)  does  not  appear  in  (2.52)  and  (2.53).  If  the  pressure 
values  from  the  checker-board  pattern  in  Figure  2.6  are  substituted  into  the  above  equations, 
it  is  seen  that  all  the  discretized  pressure  gradients  are  zero  at  all  the  nodal  points.  Con¬ 
sequently,  this  pressure  field  would  yield  the  same  effect  on  the  momentum  equations  as  a 
uniform  field,  which  is  clearly  non-physical. 
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Fig.  2.6  -  A  hypothetical  ‘checker-board’  pressure  field  on  a  uniform  finite-difference  mesh. 


2.2. 1.4  Illustration  of  Spurious  Pressure  Oscillations 

The  Galerkin  FEM  formulation  of  the  Stokes  equations  using  Pl/Pl  elements  as  sum¬ 
marized  in  (2.45)-(2.48)  is  applied  to  the  case  of  a  lid-driven  cavity.  Consider  a  unit  square 
domain  (/  =  1)  with  no-slip  walls  as  depicted  in  Figure  2.7a.  The  fluid  is  set  in  motion  by 
driving  the  upper  wall  to  the  right  at  a  constant  velocity  U  =  1. 

The  mesh  used  in  this  test  case  is  shown  in  Figure  2.7b.  It  is  much  finer  than  necessary, 
but  will  be  utilized  for  high  Reynolds  number  flows  later.  The  unstructured  triangle  mesh 
contains  approximately  3,000  nodes  and  5,750  elements.  Nodes  are  stretched  toward  the  walls 
and  clustered  heavily  in  the  corners  to  capture  the  pressure  singularities  that  develop  at  the 
intersections  of  the  lid  and  the  side  walls. 

The  resulting  approximate  solution  is  shown  in  Figure  2.8.  The  calculated  velocity  field, 
visualized  by  velocity  vectors  in  Figure  2.8(a)  is  oscillation-free.  The  calculated  pressure  field, 
visualized  by  pressure  contours  in  Figure  2.8(b),  however,  is  completely  polluted  by  spurious 
oscillations. 


26 


(a) 


.  'V. 6 

hWM 

MI90I9I 

.£<dldldll 

iBIlSI 

010 

01 

0 

0 

ss 

tfZAWi 

i  &aTaWa 
[TaTaTaTaV, 

ViViViiV.... 

will!::;:: 

iwair; 

''i'tili.i 

,  nijdiJ. 

(tl’i’iV, 

mm. 

i  wRi 

3)9 

SIIOI 

001 

101 

101 

10 

01 

0 

0 

kVf 

"A'A  A  lit 

mviVi 

rnz 

‘""m 

.nil. 

'Wi 

is. 

TAfSk 

|0I 

01 

0 

*3* 

Wffl w 

ME 

■nwh 

i.niiiiij 

m 

Wi 

i ri 

K 

•a 

Ta 

Id 

IB 

fi 

Wa 

Willi 

Wit 

"I 

i  w 

Wi 

Ta 

Ta 

ib 

\Ta 

Wa 

0 

0dl 

m 

Will u 

K 

ln,,ll!| 

lilllJjlil 

m\ 

Wi 

fi 

fA 

IS  ta 

am 

Wi 

Willi 

BE 

■MIBI 

iiiiiillJi 

IU 

Wi 

5 

5 

i 

m 

Ta\ 

m 

Wi 

Willi 

i!';.: 

'WHI 

niiiiJJii 

i  18 

Wi 

ft 

ta 

s 

IK 

0 

T£ 

TaZa 

m 

Will 

be: 

'21! 

vim 

'Wi 

ta 

Tt 

fi 

Id 

Id 

ta 

fci 

Wa  ta 

Will 

K 

..niiilJi 

am. 

Wi 

ta 

* 

01 

01 

0 

ta 

r*Vi 

mv/h 

nm;:: 

mm. 
i  mm 

0 

g 

ss 

10 

ICS 

101 

ICS 

101 

ss 

10 

0 

0 

fAWM 

WWi'i 

Will!., . 
wimz: 

mm 
,’.w. m 
WbW, 

IglStlSIlSIl 

kggjgjnj 

s 

n 

m 

gj 

nsnsn> 

hWiWi 

s. asm 

wmi::;;,: 

will. 

(b) 


Fig.  2.7  -  The  lid-driven  cavity  test  case:  (a)  relevant  geometry  and  boundary  conditions, 
and  (b)  unstructured  triangle  mesh  containing  approximately  3,000  nodes  and  5,750 
elements. 


2. 2. 1.5  Remedying  the  Spurious  Pressure  Oscillations 

In  the  FE  framework,  the  problem  of  spurious  pressure  oscillations  is  formulated  as 
a  violation  of  the  so-called  Babuska-Brezzi  (BB)  condition  (Babuska,  1971,  1973;  Brezzi, 
1974).  This  mathematical  compatibility  condition  between  the  discrete  function  spaces  for 
the  approximate  pressure  and  velocity  is  rather  abstract.  The  practical  restriction  placed  by 
this  condition,  though,  is  that  the  pressure  interpolation  must  be  of  lower  order  than  the 
velocity  interpolation.  It  should  be  noted  that  this  is  a  necessary  but  not  sufficient  constraint 
to  satisfy  the  BB  condition. 

The  Pl/Pl  element  used  in  this  work,  or  any  other  equal-order  element  for  that  matter, 
does  not  satisfy  the  stability  condition.  Due  to  the  implementation  advantages  of  equal-order 
elements  (mainly  arising  from  a  much  simpler  data  structure),  approaches  to  circumvent  the 
BB  condition  on  this  type  of  element  have  been  developed  in  recent  years.  The  approach  used 
in  this  work  is  the  Pressure  Stabilized/Petrov-Galerkin  (PSPG)  method  originally  proposed 
by  Hughes  et  al.  in  1986  (Hughes  et  al.,  1986)  and  refined  by  researchers  such  as  Tezduyar 
(Tezduyar  et  al.,  1991,  1992)  and  Franca  (Franca  and  Frey,  1992). 

2.2.2  PSPG  Formulation 

The  PSPG  formulation  is  a  full  Petrov-Galerkin  discretization  of  the  momentum/con- 
tinuity  system.  If  we  write  the  Galerkin  vector  weight  functions  as 

Wh  =  (vwh,pwh)  ,  (2.54) 
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Hughes,  et  al.  (Hughes  et  al.,  1986)  found  empirically  that  if  re  is  defined  as 

re  =  <£,  (2.59) 

2  v 

the  spurious  pressure  modes  are  suppressed  if  a  >  0.1  in  the  case  of  Ql/Ql  elements.  Tez- 
duyar  (Tezduyar  et  ah,  1991)  recommends  re  be  a  function  of  an  element  Reynolds  number 
Reu, 

t‘ = Ik  (Re"'>  ■  (2-60) 

The  characteristic  element  Reynolds  number  Reu  is  based  on  a  velocity  U  and  characteristic 
element  length  he: 

Reu  =  (2.61) 

2  v 

Tezduyar  uses  a  global  flow  velocity  for  U  rather  than  the  local  velocity  at  each  element.  This 
is  necessary  because  PSPG  stabilization  is  required  at  all  points  in  the  flow,  even  where  the 
local  Reynolds  number  is  zero.  There  are  numerous  formulations  of  the  (  function  proposed 
in  the  literature  (see  DeMulder  (DeMulder  et  ah,  1994)).  All  of  these  (  functions  have  the 
property  of  being  0(0)  for  Re  — >  0  and  0(1)  for  Re  — >  oo.  In  SFE2D,  the  form  of  re  proposed 
by  Tezduyar  (2.60)  is  employed. 

In  terms  of  the  nodal  matrix  equation,  the  PSPG  formulation  contains  additional  terms 
in  the  continuity  equation,  including  a  term  on  the  diagonal.  Specifically,  (2.45)  becomes 


cqji  +  kxSPlJ 


0  Qij 

hj  y Qij 

-yqji  +  kySPij  qsPij 


(2.62) 


The  new  (stabilization)  terms  appearing  in  (2.62)  that  are  not  in  (2.45)  are  evaluated 
from  (2.56)  as 


L 


kysPii 


(  m 

LTedy 


v2a,'  da 


f  _8NidNjjr^  ,  f  _8NidNjjr, 

SPij  —  /  ts  rj  O  dQ,e  +  /  Tg  d£le 

JQe  8x  8x  Jne  8y  8y 


(2.63) 

(2.64) 

(2.65) 


2. 2. 2.1  Analytical  Coefficient  Evaluation 

Because  linear  triangle  elements  are  used,  each  of  the  stabilization  coefficients  in  (2.62) 
can  be  evaluated  analytically  from  (2.63)-(2.65),  and  are  given  as 


cSPij  =  0 


(2.66) 
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kySPij  =  0  (2.67) 

qSPij  =  ^(Vrij  +  yniynj).  (2.68) 

The  stabilization  terms  kxSpij  and  Spij  are  zero  due  to  the  inability  of  linear  elements  to 
possess  a  non-zero  second  derivative,  i.e. 


d2Nj  =  d2Nj 
dx 2  dy2 


(2.69) 


2. 2. 2. 2  Illustration  of  PSPG  Stokes  Solution 

The  PSPG  FE  formulation  of  the  Stokes  equations,  as  summarized  in  (2.62),  is  applied  to 
the  lid-driven  cavity  test  case  described  in  Section  2.2.1.4  to  show  the  effects  of  the  pressure 
stabilization.  The  resulting  approximate  solution  is  shown  in  Figure  2.9.  The  calculated 
velocity  field,  visualized  by  velocity  vectors  in  Figure  2.9(a),  is  virtually  identical  to  that 
obtained  using  the  Galerkin  approach,  shown  in  Figure  2.8(a).  However,  the  pressure  field, 
visualized  by  pressure  contours  in  Figure  2.9(b),  is  now  seen  to  be  smooth  and  oscillation 
free. 
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Fig.  2.9  -  Stokes  flow  lid-driven  cavity  PSPG  results:  (a)  calculated  velocity  vectors,  and  (b) 
calculated  pressure  field  (25  contours  over  the  range  -20  <  p  <  20). 

2. 2. 2. 3  PSPG  Formulation  Problems 

PSPG  FE  solutions  can  still  suffer  from  convection-induced  oscillations  in  the  velocity 
field.  As  mentioned  previously,  the  Galerkin  (and  PSPG)  FEM  gives  rise  to  central-difference 
type  approximations  of  the  differential  operators.  As  expected,  these  convection-induced 
oscillations  also  plague  central-difference  FDMs  and  FVMs. 


30 


2. 2. 2. 4  Illustration  of  Convection-Induced  Oscillations 

To  illustrate  the  convection-induced  oscillations  in  the  PSPG  solution,  the  method  previ¬ 
ously  described  is  applied  to  the  case  of  flow  past  a  square  block  in  a  channel.  The  Reynolds 
number,  based  on  block  height  and  average  inlet  velocity,  is  200.  Figure  2.10  shows  the 
resulting  velocity  field  near  the  block  computed  on  a  coarse  mesh  with  a  constant  node  spac¬ 
ing  of  0.5  block  heights.  It  is  seen  that  the  velocity  solution  contains  spurious  oscillations, 
rendering  the  approximation  useless. 
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Fig.  2.10  -  Velocity  vectors  showing  convection-induced  oscillations  in  the  PSPG  solution  of 
flow  past  a  square  block  at  Re  =  200. 


2. 2. 2. 5  Remedying  the  Convection-Induced  Oscillations 


The  convection- induced  oscillations  can  be  handled  in  two  ways.  The  first  is  to  keep  the 
element  Peclet  number, 


Pe  = 


(2.70) 


small  ( Pe  <  2  guarantees  stability). 


The  other  option  is  to  introduce  certain  forms  of  upwinding  (Gresho  and  Lee,  1979; 
Versteeg  and  Malalasekera,  1995).  A  number  of  upwinding  schemes  have  been  developed  in 
the  FE  framework  (Christie  et  al.,  1976;  Heinrich  et  al.,  1977;  Hughes,  1978;  Hughes  and 
Atkinson,  1978),  the  most  common  being  the  Streamline-Upwind  Petrov-Galerkin  (SUPG) 
formulation  (Brooks  and  Hughes,  1982;  Tezduyar  and  Hughes,  1983;  Mizukami,  1985).  This 
formulation  is  advantageous  because  it  is  consistent  and  not  subject  to  the  artificial  cross-flow 
diffusion  criticisms  associated  with  many  classical  upwind  methods. 


The  SUPG  method  increases  control  over  the  convective  terms  by  adding  artificial  diffu¬ 
sion  acting  only  in  the  streamline  direction.  If  we  write  the  Galerkin  vector  weight  function 
as 


Wh  =  (vwh,pwh)  , 


(2.71) 
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the  SUPG  perturbed  weight  function  is 


+  rsvpc  (fi»  •  V  V,  0) 


(2.72) 


The  SUPG  method  can  also  be  interpreted  as  a  stabilized  FE  formulation  in  which  a 
stabilization  term  is  added  to  the  Galerkin  discretization  of  the  momentum  equation.  The 
form  of  the  stabilization  term  is 


Ssupc  =  E  /  ( w &  •  V  vwh)  ■  RhNS  <m, 

p  JQ-t 


(2.73) 


where  R^s  is  the  residual  of  the  discrete  NS  momentum  equation. 


The  ‘intrinsic  time  scale’  tsupg  is  defined  as 


tsupg  ~  ’ 

where  the  local  element  Reynolds  number,  Reu  is  based  on  the  local  velocity  uh  as 


(2.74) 


RCn  — 


(2.75) 


||m/i||  =  V  uh2  +  vh2.  (2.76) 

Instead  of  the  hydraulic  diameter,  the  characteristic  length  used  for  rSUPO  is  the  element 
length  in  the  direction  of  the  local  flow,  or 


(npe  \ 


(2.77) 


where  NPE  is  the  number  of  nodes  per  element,  s  is  the  unit  vector  in  the  direction  of  the 
local  velocity,  and  N{  is  the  basis  function  associated  with  node  i. 

The  £  function  possibilities  are  the  same  as  those  for  the  PSPG  method  and,  again,  are 
well  presented  by  DeMulder  et  al.  (DeMulder  et  al.,  1994).  For  this  solver  C  is  defined  as 


((Re)  = 


Re/3,  for  0  <  Re  <  3, 
1,  for  3  <  Re. 


(2.78) 


The  addition  of  this  SUPG  stabilization  term  has  the  same  effect  as  adding  upwinding 
in  the  FDM  and  FVM.  That  is,  the  values  at  upstream  nodes  are  weighted  more  heavily 
than  downstream  nodes.  This  is  illustrated  in  Figure  2.11,  which  shows  typical  Galerkin  and 
SUPG  weighting  functions  in  one  dimension. 
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Fig.  2.11  -  Comparison  of  the  weighting  functions  for  the  Galerkin  and  SUPG  methods.  The 
Galerkin  method  weights  upstream  and  downstream  nodes  equally,  whereas  the 
SUPG  method  weights  the  upstream  nodes  more  heavily.  The  dashed  line  in 
the  SUPG  method  shows  the  basis  functions  are  unchanged  from  the  Galerkin 
method. 

In  Figures  2.12  and  2.13,  numerical  results  are  again  shown  for  the  test  case  of  flow 
over  a  square  bump.  The  results  in  Figure  2.12  are  obtained  by  employing  a  SUPG/PSPG 
formulation  utilizing  the  same  mesh  mentioned  previously  (with  constant  node  spacing  of  0.5 
block  heights) .  It  is  seen  that  the  addition  of  the  upwinding  terms  has  removed  the  convective 
instabilities.  In  Figure  2.13,  on  the  other  hand,  no  upwinding  is  used.  Instead,  the  mesh 
has  been  refined  sufficiently  to  remove  the  convection-induced  oscillations.  The  mesh  used 
to  obtain  these  oscillation-free  results  has  a  node  spacing  of  0.1  block  heights. 


Fig.  2.12  -  Velocity  vectors  showing  the  oscillation-free  SUPG/PSPG  coarse-mesh  velocity 
solution  of  flow  past  a  square  block  at  Re  =  200. 


2.3  Temporal  Discretization 

In  this  section  the  temporal  discretization  is  described.  Because  the  convective  terms  are 
treated  explicitly  in  time,  this  is  the  first  point  at  which  the  full  incompressible  NS  equations 
can  be  considered. 
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Fig.  2.13  -  Velocity  vectors  showing  the  oscillation-free  PSPG  fine- mesh  velocity  solution  of 
flow  past  a  square  block  at  Re  =  200. 


Consider,  then,  incompressible  laminar  flow  of  a  Newtonian  fluid  in  the  computational 
domain  with  boundary  T  =  U  Tn,  where  Td  is  the  portion  of  the  boundary  subject  to 
Dirichlet  boundary  conditions  and  Tn  the  portion  subject  to  Neumann  conditions.  The  flow 
is  governed  by  the  momentum  equation 

du  .  .  „ 

+  («  •  V)  u  +  Vp  -  uV2u  =  0  (2.79) 

and  the  continuity  equation 

V  •  u  =  0.  (2.80) 

In  the  above  equations,  v  is  the  kinematic  viscosity  of  the  fluid,  u  is  the  2D  velocity  vector 
(u,v),  and  p  is  the  kinematic  pressure  (pressure  divided  by  density). 

Equations  (2.79)  and  (2.80)  are  subject  to  Dirichlet  and  Neumann  boundary  conditions: 

u  =  u  on  Td,  (2.81) 

(—  p[I\  +  uVu)  ■  n  -  0  onTn,  (2.82) 

where  u  is  the  imposed  Dirichlet  velocity  boundary  condition,  [/]  is  the  identity  matrix,  n  is 
the  outward  unit  normal  on  T,  and  9  is  the  imposed  surface  traction  boundary  condition. 


2.3.1  Pressure  and  Diffusion  Terms 


It  is  desired  to  have  second-order  accuracy  in  the  time  discretization.  For  the  pressure 
and  diffusion  terms,  this  is  accomplished  by  applying  a  Crank-Nicolson  method.  In  terms  of 
a  simple  example  first-order  ODE 

df  ,  , 

jt  = xf-  <283) 

this  method  results  in  the  discrete  expression 


fn+l  _  ^  /»+ 1  +  fr 

At  A  2 


(2.84) 
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where  []n+1  denotes  the  new  time  step  and  []n  the  current  time  step. 

It  is  apparent  from  (2.84)  that  the  Crank-Nicolson  method  is  based  on  central  differencing 
(centered  about  time  n+  |)  and  hence  is  second-order  accurate.  Due  to  the  implicit  nature 
of  the  scheme,  it  is  mathematically  deemed  unconditionally  stable  for  all  time  step  values, 
meaning  that  perturbations  are  not  amplified  with  time.  In  practice,  though,  it  is  found  that 
oscillations  can  occur  for  sufficiently  large  time  step  values. 


2.3.2  Convective  Terms 


The  convective  terms  are  treated  using  a  second-order  explicit  Adams-Bashforth  method. 
Again,  in  terms  of  the  example  first-order  ODE  (2.83),  the  method  results  in  the  discrete 


expression 


r+1  -  r 

At 


=  x(l  r-\r~l 


(2.85) 


As  with  all  explicit  schemes,  the  Adams-Bashforth  method  imposes  a  stability  limit  on  the 
time  step  size.  Though  this  can  be  a  serious  limitation  in  some  cases,  an  explicit  treatment 
of  the  convection  terms  is  desirable  in  this  work  for  two  main  reasons. 


•  The  convective  terms  are  non-linear.  Therefore,  if  treated  explicitly,  the  system  is  linear 
within  each  time  step  and  there  is  no  need  for  multiple  iterations  within  the  time  step 
or  the  necessity  to  calculate  expensive  Jacobian  matrices. 

•  The  ultimate  intent  of  this  solver  is  to  perform  LES.  Since  LES  requires  very  small  time 
steps  anyway,  the  time  step  limitation  due  to  the  explicit  convection  treatment  is  less 
important. 


2.3.3  SUPG/PSPG  Formulation  of  the  NS  Equations 

When  the  previously  mentioned  time  discretizations  are  applied  to  the  2D  incompressible 
NS  equations,  the  following  expression  is  obtained: 

,771+1  _  rfn  o  1 

. -Af—  + 1  (g*  ■  V)fl"  -  ^  (g-  •  v)  a"-1  = 

-  5  (Vp”+1  +  Vp“)  +  \v  (V2a“+1  +  W)  (2.86) 

Applying  the  SUPG/PSPG  FEM  to  the  above  equation  results  in  a  matrix  equation  to 
be  solved  at  each  time  step  to  obtain  the  approximate  solution.  To  assist  in  implementation, 
the  matrix  equation  is  written  in  terms  of  the  change  in  flow  parameters  during  each  time 
step, 

Su  =  un+1-vn,  (2.87) 
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Sp  =  pn+1-pn ,  (2.88) 

rather  than  the  flow  parameters  themselves,  un+l  and  pn+1.  The  resulting  nodal  matrix 
equation  is 
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(2.89) 


All  SUPG  terms  are  treated  in  an  explicit  manner  for  the  same  reason  as  the  convective 
terms.  That  is,  because  they  contain  a  product  of  unknown  velocities,  the  terms  are  nonlinear. 
Because  all  nonlinear  terms  are  treated  explicitly,  the  terms  on  the  right-hand-side  are  all 
geometry-based,  and  are  constant  for  a  given  mesh  (assuming  the  boundary  conditions  are 
also  constant). 


Without  explicitly  writing  the  complete  derivation,  the  new  terms  in  the  matrix  equation 
are  summarized  below. 
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(2.90) 

(2.91) 

(2.92) 

(2.93) 

(2.94) 

(2.95) 
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P„  f  (*dNi  ^dNiW&Uj.- 

Sui  ~  Jne  Tsvpg  \  dx  +V  dy)[  At  Nj+ 


f  A  dNj  A  dNj  \  dNj]  „ 
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dNj  A  dNj  \  aJVj-1 

dfi« 


n^+^+Pr^ 


(2.96) 


(2.97) 


2.3.3. 1  Analytical  Coefficient  Evaluation 

Because  linear  triangle  elements  are  used,  each  of  the  coefficients  in  (2.89)  can  be  eval¬ 
uated  analytically.  The  new  terms  introduced  in  this  section,  (2.90)— (2.95) ,  become 
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(2.99) 
(2.100) 
(2.101) 
(2.102) 
(2.103) 


(2.104) 


(2.105) 


where  6  is  the  Kronecker  delta  operator. 


2.4  Linear  System  Solution 

The  SPARSKIT  tool  kit,  developed  by  Youcef  Saad,  University  of  Minnesota  (Saad, 
1994b),  was  used  to  solve  the  linear  system  of  algebraic  equations  generated  by  the  SUPG/PSPG 
FEM.  The  kit  includes  a  number  of  routines  written  in  Fortran77  for  manipulating  and  work- 
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ing  with  sparse  matrices,  including  iterative  algorithms  for  solving  the  matrix  equation 

MM  =  {b},  (2.106) 

where  [A]  is  a  user-supplied  n  x  n  sparse  matrix,  {b}  is  a  user-supplied  vector  of  length  n, 
and  {a:}  is  a  vector  of  unknowns  of  length  n  to  be  computed. 

SPARSKIT  includes  a  number  of  Krylov  iterative  methods  (Saad,  1981)  used  in  conjunc¬ 
tion  with  various  preconditioners.  In  particular,  satisfactory  results  in  terms  of  robustness 
and  speed  have  been  obtained  using  a  restarted  GMRES  algorithm  (Saad  and  Schultz,  1986) 
with  an  ILUT  preconditioner  (Saad,  1994a). 

A  selection  of  sparse  matrix  storage  formats  are  available  to  describe  the  system  matrix. 
Though  it  does  not  exploit  the  block  structure  of  the  matrix,  the  compressed  sparse  row 
format  (Appendix  A)  is  used  inside  SFE2D  because  it  is  the  format  SPARSKIT  uses  for  its 
internal  computations. 


2.5  SFE2D  Summary 

This  section  provides  a  detailed  summary  of  the  implementation  of  the  SUPG /PSPG 
FEM  in  the  SFE2D  NS  solver. 

The  solver  finds  an  approximate  solution  to  the  2D  unsteady  NS  equations  in  terms  of 
the  primitive  variables  u,  P.  When  the  governing  equations  (2.79)-(2.80)  are  discretized,  a 
system  of  linear  equations  is  formed  that  must  be  solved  to  obtain  the  approximate  solution. 
Because  the  SFE2D  solver  uses  a  nodal  ordering  of  the  unknowns,  when  this  linear  system 
is  written  in  matrix  form,  the  matrix  equation  to  be  solved  (2.89)  can  be  characterized  by  a 
3x3  system  associated  with  each  node,  (2.89). 

All  the  unknowns  appearing  in  (2.89)  are  obtained  by  looping  through  the  nodes  of  each 
element  and  summing  the  contributions.  Each  of  the  coefficient  contributions  are  derived 
from  analytical  expressions  derived  from  the  PSPG  FEM.  These  expressions  are  given  in 
(2.49)— (2.51),  (2.66)— (2.68) ,  and  (2.98)-(2. 103). 

Figure  2.14  presents  a  flow  chart  giving  the  basic  structure  of  the  SFE2D  solver.  Each 
of  the  blocks  in  this  chart  will  now  be  discussed  in  more  detail. 

2. 5. 0.1.1  Initialize. 

The  user-defined  parameters  for  SFE2D  are  given  via  a  Fortran  NAMELIST  file.  These 
parameters  contain  such  information  as  time  step  size,  number  of  time  steps,  convergence 
criteria,  sparse  matrix  solver  parameters,  and  Dirichlet  boundary  condition  values.  Addi¬ 
tionally,  input  and  output  filenames  are  obtained  and  the  corresponding  files  are  opened. 
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2. 5. 0.1. 2  Read  grid  and  flow  variables. 

The  input  grid  file  is  read.  The  format  of  this  file  is  the  DPlot  format  developed  at 
the  von  Karman  Institute  and  is  described  in  Appendix  B.  This  file  contains  the  2D  nodal 
coordinates  and  connectivity  as  well  as  flow  values  at  each  node. 

2. 5. 0.1. 3  Create  CSR  matrix  structure  from  the  connectivity  information. 

The  structure  of  the  sparse  system  matrix  (the  location  of  the  non-zero  entries)  can  be 
determined  from  the  mesh  connectivity  information  read  from  the  DPlot  input  file.  This 
structure  is  stored  in  the  in  and  ja  arrays  as  described  in  Appendix  A. 

2. 5. 0.1.4  Set  initial  conditions. 

The  initial  conditions — the  values  of  u  and  p  for  levels  n  and  n  —  1  at  the  first  time  step — 
are  set  from  the  values  stored  in  the  input  DPlot  file  if  a  restart  is  requested.  Otherwise,  the 
initial  conditions  are  set  to  zero. 

2. 5.0. 1.5  Build  [A]  matrix  from  SUPG/PSPG  FEM  and  boundary  conditions.. .store. 

The  non-zero  values  in  the  system  matrix  are  calculated  from  (2.89)  and  the  boundary 
conditions.  All  values  in  this  matrix  are  geometry-based  and  do  not  change  from  time  step 
to  time  step  (assuming  boundary  condition  types  do  not  change) .  Consequently,  this  matrix 
is  calculated  only  once  and  stored  for  use  in  all  time  steps. 

2. 5.0. 1.6  Create  {6}  vector  from  SUPG/PSPG  FEM  and  boundary  conditions. 

The  right-hand-side  vector  is  calculated  from  (2.89)  and  the  boundary  conditions.  This 
vector  is  calculated  at  each  time  step. 

2. 5. 0.1. 7  Solve  [A]{a;}  =  {b}  using  SPARSKIT. 

The  matrix  equation  is  passed  to  the  SPARSKIT  routines  to  be  solved.  Because  an 
iterative  method  is  used,  an  initial  guess  for  the  solution  is  required.  SFE2D  uses  the  zero 
vector  as  the  initial  guess  because  {x}  contains  the  change  in  flow  variables  between  time 
steps  and  the  time  steps  are  assumed  small  in  order  to  satisfy  the  stability  criteria  of  the 
explicit  convection  treatment.  Because  the  system  matrix  is  constant,  the  preconditioning 
matrix  is  calculated  only  at  the  first  time  step  and  stored  for  use  on  all  other  time  steps. 
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2. 5. 0.1. 8  Update  variables  for  next  time  step. 

Once  the  matrix  equation  has  been  approximately  solved,  the  flow  variables  are  updated. 
That  is, 


1  =  «?, 
<  =  <+1, 


e+1 


£?+1 


+  SHi, 


p?~'=p f 

p?=?r+1 


2.5.0. 1.9  Output  to  file  if  requested. 

The  Fortran  NAMELIST  file  contains  an  output  frequency  parameter  stating  the  number 
of  time  steps  between  writing  of  the  output  files.  If  the  current  time  step  requests  output 
files  to  be  written,  a  ‘restart’  file  is  written  in  DPlot  format  as  well  as  a  post-processing  file 
in  TecPlot  and/or  FieldView  unstructured  grid  format. 


Fig.  2.14  -  Flow  chart  showing  the  basic  struc 
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of  the  SFE2D  Navier-Stokes  solver. 


3.  2D  LAMINAR  FLOW  TEST  CASES 


This  chapter  presents  a  number  of  test  cases  showing  the  validity  of  the  SFE2D  solver. 
The  first  case  presented  is  that  of  steady  lid-driven  cavity  flow.  Next,  steady  flow  past  a 
backward-facing  step  is  calculated.  Finally,  time-accurate  results  for  flow  past  a  circular 
cylinder  in  a  channel  are  presented. 

3.1  Lid-Driven  Flow  in  a  Square  Cavity 

3.1.1  Problem  Description 

In  this  first  test  case,  flow  inside  a  square  cavity  as  depicted  in  Figure  3.1  is  considered. 
The  cavity  has  unit  dimensions  ( l  =  1)  and  the  flow  is  set  in  motion  by  driving  the  top  wall 
to  the  right  at  a  constant  velocity  (U  =  1).  No-slip  velocity  conditions  are  applied  on  all 
boundaries,  resulting  in  a  purely  Dirichlet  velocity  problem.  Because  the  pressure  field  need 
only  be  defined  up  to  an  arbitrary  constant,  a  pressure  datum  p  =  0  is  set  at  the  center  of 
the  top  wall. 


Fig.  3.1  -  Relevant  geometry  and  boundary  conditions  for  the  lid-driven  cavity  test  case. 


Although  this  case  is  purely  academic  in  nature,  it  is  useful  in  testing  numerical  algo¬ 
rithms  for  solving  the  NS  equations  for  a  number  of  reasons.  First,  the  boundary  conditions 
are  well  defined  and  easy  to  implement.  There  are  no  concerns  about  whether  inflow/outflow 
conditions  are  appropriately  set,  and  the  extents  of  the  computational  domain  are  clearly 
defined.  In  addition,  a  number  of  different  flow  regimes  are  contained  in  the  flow:  a  wall 
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boundary  layer,  an  internal  shear  layer,  and  recirculating  flow.  Finally,  the  properties  of  the 
flowfield  vary  distinctly  with  Reynolds  number.  The  main  characteristics  of  the  flowfield  are 
a  primary  recirculation  cell  near  the  center  of  the  cavity  and  pressure  singularities  at  the  top 
corners  where  there  is  a  discontinuity  in  the  velocity  boundary  conditions.  At  high  Reynolds 
numbers,  secondary  and  tertiary  vortices  develop  in  the  corners. 

The  results  of  SFE2D  are  compared  to  the  benchmark  solutions  of  Ghia  et  al.  (Ghia  et  ah, 
1988).  Ghia’s  results  were  obtained  in  1982  using  a  second-order  finite- difference  approach  on 
very  fine,  uniform  Cartesian  meshes  and  have  been  accepted  as  benchmark  solutions  for  many 
years.  Typical  mesh  sizes  used  by  Ghia  are  129  x  129  =  16641  nodes  and  257  x  257  =  66049 
nodes.  By  proper  mesh  design,  SFE2D  produces  results  similar  in  accuracy  using  far  fewer 
nodes. 

As  explained  in  Chapter  2,  SFE2D  is  an  unsteady  solver.  Consequently,  the  steady-state 
solutions  presented  here  are  obtained  by  starting  with  a  stagnant  flowfield  and  marching  in 
time  until  the  flow  no  longer  changes.  Surely,  SFE2D  is  not  efficient  at  solving  these  steady- 
state  problems  because  it  was  designed  specifically  for  time-accurate  simulations.  However, 
the  lid-driven  cavity  test  case  allows  the  spatial  accuracy  of  the  algorithm  to  be  examined. 

3.1.2  Computational  Mesh 

The  computational  mesh  is  shown  in  Figure  3.2.  It  was  constructed  using  the  Delaundo 
grid  generator  developed  by  J.D.  Muller  at  the  von  Karman  Institute.  This  grid  generator 
is  based  on  ideas  from  both  the  Delaunay  triangulation  approach  and  the  advancing  front 
method  (Muller  et  ah,  1993),  resulting  in  good  connectivity,  smooth  point  distributions,  and 
efficient  implementation.  The  user  supplies  the  boundary  geometry  and  point  distribution, 
and  Delaundo  creates  an  unstructured  mesh  of  approximately  isotropic  triangles. 

Near  solid  boundaries  it  is  often  necessary  to  stretch  the  nodes  close  to  the  wall  in 
order  to  resolve  the  boundary  layer.  If  this  is  done  with  isotropic  triangles,  Figure  3.3(a), 
the  node  spacing  tangent  to  the  wall  is  approximately  the  same  as  in  the  normal  direction. 
This  results  in  a  terrible  waste  of  nodes  because  the  gradients  tangent  to  the  wall  are  much 
smaller  than  the  normal  gradients.  What  is  preferred  is  to  use  high  aspect-ratio  elements 
in  the  boundary  layer.  Using  standard  unstructured  triangle  algorithms  in  these  boundary 
layer  regions  result  in  awkward  triangles  similar  to  that  shown  in  Figure  3.3(b),  which  are 
poorly-suited  for  numerical  computations.  Babuska  and  Aziz  (Babuska  and  Aziz,  1976)  have 
proved  that  the  quality  of  a  piecewise-linear  FE  approximation  degrades  for  high-aspect  ratio 
cell  only  if  the  maximum  angle  is  too  near  180  degrees.  Consequently,  structured  ‘wedges,’ 
as  shown  in  Figure  3.3(c)  are  used  in  the  boundary  layer  regions. 

The  mesh  shown  in  Figure  3.2  consists  of  3013  nodes  and  5752  linear  triangle  elements. 
Boundary-layer  regions  of  structured  wedges  exist  along  each  of  the  four  walls.  Very  near 
the  corners  (within  0.025  units),  the  wedge  layers  give  way  to  very  small  isotropic  triangles 


43 


Fig.  3.2  Unstructured  triangle  mesh  utilized  for  the  lid-driven  cavity  test  case,  containing 
3013  nodes  and  5752  elements. 


Fig.  3.3  -  Triangle  elements:  (a)  isotropic  triangle,  (b)  high  aspect  ratio  triangle  poorly  suited 
for  numerical  computations,  (c)  high  aspect  ratio  ‘wedge’  well  suited  for  numerical 
computations. 

(Figure  3.4).  This  allows  for  very  tight  clustering  near  the  corners  in  order  to  better  capture 
the  pressure  singularities  that  may  exist. 

3.1.3  Re  =  100 

A  particle  tracer  visualization  of  the  velocity  field  for  Re  =  100  is  shown  in  Figure  3.5. 
This  figure,  as  well  as  the  velocity  vectors  in  Figure  3.6,  show  the  primary  vortex  core  is 
located  toward  the  upper-right  corner  of  the  cavity  (x,y)  =  (0.623,0.741)  at  this  Reynolds 
number.  Also,  very  weak  vortices  appear  in  the  bottom  corners.  The  associated  pressure 
field  is  displayed  via  contour  lines  in  Figure  3.7,  where  singularities  can  be  seen  in  the  upper 
corners. 

A  quantitative  comparison  with  the  Ghia  benchmark  results  is  made  by  plotting  velocity 
profiles  through  the  geometric  center  of  the  cavity— the  u-velocity  profile  through  a  vertical 
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Fig.  3.4  -  Mesh  in  the  lower-left  corner  of  the  domain,  0  <  x,  y  <  0.05,  showing  the  transition 
from  wedge-shaped  triangles  to  isotropic  triangles  in  the  corners. 

cut  and  the  v- velocity  profile  through  a  horizontal  cut.  The  results  of  this  comparison  are 
shown  in  Figure  3.8.  Note  that  the  agreement  is  excellent  and  that  no  velocity  or  pressure 
oscillations  are  present. 

3.1.4  Re  =  1000 

A  particle  tracer  visualization  of  the  velocity  field  for  Re  =  1000  is  shown  in  Figure  3.9. 
This  figure,  as  well  as  the  velocity  vectors  in  Figure  3.10  show  the  primary  vortex  core 
located  very  near  the  center  of  the  cavity  (x ,y)  =  (0.525,0.572)  at  this  Reynolds  number. 
The  secondary  vortices  in  the  bottom  corners  are  much  larger  than  at  Re  =  100.  The 
associated  pressure  field  is  displayed  via  contour  lines  in  Figure  3.11. 

Again,  a  quantitative  comparison  with  the  Ghia  benchmark  results  is  made  by  plotting 
velocity  profiles  through  the  geometric  center  of  the  cavity — the  u- velocity  profile  through  a 
vertical  cut  and  the  u- velocity  profile  through  a  horizontal  cut.  The  results  of  this  comparison 
are  shown  in  Figure  3.12.  As  before,  note  that  the  agreement  is  excellent— the  mesh  is 
stretched  sufficiently  toward  the  walls  and  corners  to  capture  the  velocity  profile,  although 
only  one-fifth  as  many  nodes  are  used. 

3.1.5  Re  =  5000 

A  particle  tracer  visualization  of  the  velocity  field  for  Re  =  5000  is  shown  in  Figure  3.13. 
This  figure,  as  well  as  the  velocity  vectors  in  Figure  3.14,  show  the  primary  vortex  core 
located  very  near  the  center  of  the  cavity  (x,y)  =  (0.515,0.530)  at  this  Reynolds  number. 
The  secondary  vortices  in  the  bottom  corners  are  apparent,  as  well  as  a  new  vortex  that 
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develops  near  the  upper  left  corner.  The  associated  pressure  field  is  displayed  via  contour 
lines  in  Figure  3.15. 

Again,  a  quantitative  comparison  with  the  Ghia  benchmark  results  is  made  by  plotting 
velocity  profiles  through  the  geometric  center  of  the  cavity— the  u-velocity  profile  through  a 
vertical  cut  and  the  v- velocity  profile  through  a  horizontal  cut.  The  results  of  this  comparison 
are  shown  in  Figure  3.16.  As  before,  note  that  the  agreement  is  excellent — the  mesh  is 
stretched  sufficiently  toward  the  walls  and  corners  to  capture  the  velocity  profile,  although 
only  one-fifth  as  many  nodes  are  used. 


Fig.  3.5  -  Lid-driven  cavity  flow  (Re  =  100):  streamtrace  visualization  of  the  flow  field. 
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Fig.  3.6  -  Lid-driven  cavity  flow  (Re  =  100):  velocity  vector  visualization  of  the  flow  field. 


Fig.  3.7  -  Lid-driven  cavity  flow  (Re  =  100):  contour  plot  visualization  of  the  pressure  field. 
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Fig.  3.8  -  Lid-driven  cavity  flow  (Re  =  100):  comparison  of  the  results  of  this  work  (lines) 
with  the  benchmark  solution  of  Ghia  et  al  (Ghia  et  al.,  1988)  (markers).  The  u(x  = 
0.5,  y)  profile  is  shown  via  the  solid  line  and  square  markers.  The  v(x,  y  =  0.5) 
profile  is  shown  via  the  dashed  line  and  diamond  markers. 
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Fig.  3.12  -  Lid-driven  cavity  flow  (Re  =  1000):  comparison  of  the  results  of  this  work  (lines) 
with  the  benchmark  solution  of  Ghia  et  al  (Ghia  et  al.,  1988)  (markers).  The 
u(x  —  0.5,  y)  profile  is  shown  via  the  solid  line  and  square  markers.  The  v(x,  y  = 
0.5)  profile  is  shown  via  the  dashed  line  and  diamond  markers. 
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3.2  Backward-Facing  Step  Flow 

3.2.1  Problem  Description 

The  backward-facing  step  case  is  flow  in  a  channel  having  a  sudden  asymmetric  expan¬ 
sion.  Separated  flows  resulting  from  such  geometry  changes  are  common  in  industrial  internal 
flow  applications,  where  the  performance  of  the  device  is  very  much  dependent  on  the  struc¬ 
ture  of  the  flowfield.  Backward-facing  step  flow  is  also  a  good  CFD  test  case  because  the 
accuracy  of  the  solver  can  be  tested  by  measuring  the  attachment  length  of  the  separation 
bubble  as  a  function  of  Reynolds  number  and  comparing  with  experimental  values. 

Backward-facing  step  flows  are  characterized  by  the  Reynolds  number  and  the  expansion 
ratio  (channel  height  downstream  of  the  step  divided  by  channel  height  upstream  of  the  step). 
For  this  test  case  the  geometry  and  flow  conditions  of  Armaly  et  al  (Armaly  et  ah,  1983), 
shown  in  Figure  3.17,  are  simulated.  This  particular  experimental  study  was  chosen  because 
great  care  was  taken  to  ensure  2D  flow  upstream  of  the  step  and  attachment  lengths  are 
presented  for  varying  Reynolds  numbers.  Also,  at  the  expansion  ratio  chosen  by  Armaly  et 
al,  a  secondary  recirculation  cell  may  appear,  providing  yet  another  check  on  the  accuracy  of 
the  solver.  The  flow  enters  through  a  long  entrance  channel  (4.08  step  heights  ( 5 )  long  in  the 
experiment),  past  the  step,  and  through  the  exit  channel  (>  67.55  long  in  the  experiment). 
The  upstream  channel  height  is  h—  1.065  and  the  downstream  channel  height  is  H  —  2.065, 
resulting  in  a  channel  expansion  ratio  of  1:1.94.  Armaly  et  al  provides  results  for  Reynolds 
numbers  ranging  from  Re  =  100  to  Re  =  7000.  Three  Reynolds  number  values  were  simulated 
using  SFE2D,  all  of  which  come  from  the  laminar  regime  (Re  <  1200):  150,  500,  and  1000. 
The  definition  of  the  Reynolds  number  used  in  this  case  is  given  by 


(3.1) 


where  V  is  two-thirds  of  the  measured  maximum  inlet  velocity  (which  corresponds  to  the 
average  inlet  velocity),  D  is  the  hydraulic  diameter  of  the  inlet  channel  (twice  its  height), 
and  v  is  the  kinematic  viscosity. 

The  flowfield  contains  a  primary  recirculation  cell  brought  about  by  separation  off  the 
step  edge.  At  certain  Reynolds  numbers,  a  secondary  cell  appears  on  the  top  wall  due  to  the 
adverse  pressure  gradient  created  by  the  sudden  expansion.  The  existence  of  this  secondary 
recirculation  cell  is  dependent  on  the  expansion  ratio  and  the  Reynolds  number.  According 
to  Armaly’s  experiments,  at  Re  =  150  only  the  primary  recirculation  cell  exists,  while  at 
Re  =  500  and  1000  the  secondary  recirculation  cell  is  also  apparent. 

The  three  calculated  lengths  x\,  x4,  and  x5  shown  in  Figure  3.17  are  compared  with 
the  published  experimental  values.  The  length  X\  is  the  distance  from  the  step  edge  to 
the  primary  reattachment  point  on  the  lower  wall.  The  distance  from  the  step  edge  to  the 
separation  point  of  the  secondary  recirculation  cell  on  the  top  wall  is  x4,  and  the  distance 
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Fig.  3.17  -  Flow  over  a  backward-facing  step:  geometry  with  pertinent  lengths  and  general 
flowfield  properties. 

from  the  step  edge  to  the  reattachment  point  of  the  secondary  recirculation  cell  on  the  top 
wall  is  x5. 

In  comparing  the  results  of  SFE2D  to  Armaly’s,  it  is  important  to  know  the  experimental 
error  in  the  measured  data.  Unfortunately,  his  paper  gives  few  clues  as  to  the  uncertainty 
of  the  results.  This  uncertainty  is  compounded  because  the  original  quantitative  data  is  not 
available.  Instead,  one  is  forced  to  read  the  points  off  the  graph  included  in  Armaly’s  paper. 
In  this  case,  the  graph  was  scanned  into  a  computer  at  a  high  resolution,  and  the  data  points 
were  digitized  from  the  resulting  image.  The  errors  from  this  procedure  alone  are  estimated 
at  5-8%. 

3.2.2  Computational  Mesh 

The  computational  mesh  used  in  this  case  consists  of  12,000  nodes,  which  are  clustered  in 
the  area  of  the  step  and  down  the  channel  215',  where  the  recirculation  regions  are  expected. 
Figure  3.18  shows  the  mesh  in  the  region  of  the  step,  giving  an  idea  of  the  mesh  density. 
Additionally,  a  mesh  containing  20,000  nodes  was  created  in  order  to  check  grid  convergence 
of  the  solutions.  The  domain  extends  5 5  upstream  of  the  step  and  355  downstream.  Note 
that  in  this  case  there  are  no  boundary  layer  ‘wedge’  elements.  Rather,  all  elements  are 
nearly  isotropic  triangles. 

The  Gambit  unstructured  mesh  generator  (Fluent,  Inc.)  (flu,  1998)  was  used  to  create 
the  mesh.  This  generator  allows  for  meshes  of  comparable  quality  to  Delaundo  (including 
the  capability  to  create  boundary  layer  ‘wedge’  elements)  plus  provides  a  graphical  interface 
for  easy  mesh  creation  and  modification. 

At  the  inlet  to  the  domain  a  fully-developed  parabolic  profile  was  prescribed  of  the  form 

QU 

u  =  j2-((y-  yb)h  -(y-  Vb)2) ,  (3.2) 


56 


Fig.  3.18  -  Flow  over  a  backward-facing  step:  computational  mesh  in  the  region  of  the  step. 
The  complete  mesh  contains  approximately  12,000  nodes/24,000  elements 


where  U  is  the  mean  inlet  velocity,  h  is  the  inlet  channel  height,  and  y %  is  the  y-coordinate 
of  the  inlet  channel  bottom  wall.  At  the  outlet,  a  Neumann  velocity  boundary  condition  was 
applied.  Finally,  no-slip  conditions  were  applied  on  all  channel  walls. 

3.2.3  Results 

Steady-state  results  were  obtained  by  starting  with  a  stagnant  field  and  advancing  in 
time  until  the  solution  no  longer  varied  in  time.  Figure  3.19  shows  u- velocity  contours  for 
Reynolds  numbers  150,  500,  and  1500.  Qualitative  agreement  with  experimental  observations 
is  achieved.  Only  the  primary  recirculation  region  is  apparent  at  Re  =  150,  while  at  Re  =  500 
and  1000,  both  the  primary  and  secondary  recirculation  regions  exist.  For  completeness,  a 
representative  (Re  =  500)  pressure  field  in  the  step  region  is  provided  in  Figure  3.20. 

a) 

b) 


c) 


Fig.  3.19  -  Flow  over  a  backward-facing  step:  contours  of  constant  streamwise  velocity  in  the 
region  —1  <  x/S  <  27  for  Reynolds  number  (a)  150,  (b)  500,  and  (c)  1000. 

The  computed  detachment /reattachment  lengths  described  in  Figure  3.17  compared  with 
the  experimental  results  are  shown  in  Table  3.1.  At  Re  =  150  and  500  the  errors  are  certainly 
acceptable  in  that,  with  the  exception  of  x5/S  on  the  coarse  mesh,  they  are  within  the 
uncertainty  at  which  Armaly’s  data  is  known  (5-8%).  At  Re  =  1000  the  errors  are  quite 


Fig.  3.20  -  Flow  over  a  backward-facing  step:  contours  of  constant  pressure  in  the  region 
— 3  <  x/S  <8  fox  Reynolds  number  500. 

large,  however.  Though  larger  errors  are  expected  at  higher  Reynolds  numbers,  there  may 
be  an  additional  explanation.  In  the  experiments  Armaly  observed  that,  though  the  flow  was 
laminar,  it  was  beginning  to  exhibit  3D  features  at  this  Reynolds  number.  Consequently, 
using  a  2D  solver  for  this  flow  is  not  strictly  correct  and  likely  adds  to  the  error.  Also 
shown  for  Re  =  1000  are  results  from  three  commercial  CFD  solvers:  FLOTRAN,  Flow-3D, 
and  Fluent.  These  results  are  taken  from  a  CFD  benchmark  summary  compiled  by  Freitas 
(Freitas,  1995)  and  represent  solutions  provided  by  the  respective  companies  for  this  test 
case.  Though  the  grid  structure,  grid  size,  and  discretization  strategy  may  differ  from  those 
used  here,  it  is  useful  to  see  that  SFE2D  provides  results  similar  to  commercial  CFD  codes. 

Table  3.1  -  Flow  over  a  backward  facing  step:  comparison  of  detachment/reattachment 
lengths.  Experimental  results  are  from  Armaly  et  al  (Armaly  et  al.,  1983);  nu- 


merical  results  compiled  by  Freitas  (Freitas,  1995). 


xi/S 

%  error 

x4/S 

%  error 

x,/s 

%  error 

Re  =  150 

Armaly  et  al. 

4.02 

SFE2D  (coarse) 

3.92 

2.49 

SFE2D  (fine) 

3.95 

1.74 

Re  =  500 

Armaly  et  al. 

10.00 

8.29 

13.50 

SFE2D  (coarse) 

9.42 

5.80 

8.58 

3.50 

12.26 

9.19 

SFE2D  (fine) 

9.50 

5.00 

8.27 

0.24 

12.57 

6.89 

Re  =  1000 

Armaly  et  al. 

16.20 

13.30 

21.70 

SFE2D  (coarse) 

13.31 

17.84 

10.49 

21.13 

24.58 

13.27 

SFE2D  (fine) 

13.33 

17.72 

10.85 

18.42 

24.06 

10.88 

FLOTRAN 

8.57 

47.10 

6.23 

53.16 

16.66 

23.23 

Flow-3D 

12.23 

24.51 

9.50 

28.57 

22.40 

3.23 

Fluent 

13.08 

19.26 

10.33 

22.33 

24.70 

13.82 

Finally,  Figure  3.21  shows  streamwise  velocity  profiles  at  three  locations  in  the  channel 


58 


(x/S  =  -1.76,  7.04,  and  19.04)  and  Re  =  1095.  The  agreement  between  the  coarse  mesh 
solution,  fine  mesh  solution,  and  experiment  is  satisfactory. 


x/S  =-1.76  x/S  =  7.04  x/S  =  19.04 


Fig.  3.21  -  Flow  over  a  backward-facing  step:  streamwise  velocity  profiles  at  three  locations 
in  the  channel  and  Re  =  1095.  SFE2D  coarse  mesh  (dashed  line),  SFE2D  fine 
mesh  (solid  line),  Armaly  et  al  (Armaly  et  ah,  1983)  (o). 


3.3  Circular  Cylinder  Flow 

3.3.1  Problem  Description 

The  flow  over  a  circular  cylinder  is  interesting  from  an  engineering  viewpoint  because 
it  contains  several  distinct  flow  characteristics,  including  boundary  layers,  flow  separation, 
shear  layers,  and  a  wake  region.  This  flow  is  certainly  among  the  most  extensively  studied, 
both  numerically  and  experimentally.  Consequently,  it  is  an  excellent  test  case  because  a  large 
amount  of  data  is  available  for  comparison.  As  a  consequence  of  this  vast  amount  of  available 
data,  review  articles  are  a  good  means  to  gain  an  understanding  of  the  basic  characteristics 
of  this  flow.  A  number  of  reviews  have  been  published  from  as  far  back  as  1964,  including 
Morvokin  (Morkovin,  1964),  Berger  and  Wille  (Berger  and  Wille,  1972),  Beaudan  and  Moin 
(Beaudan  and  Moin,  1994),  and  Williamson  (Williamson,  1996b). 

The  behavior  of  flow  over  a  circular  cylinder  is  highly  dependent  upon  Reynolds  number; 
in  this  case  typically  based  upon  the  cylinder  diameter  D  and  the  freestream  velocity  Uq,  or 

ReD  =  (3.3) 

At  very  low  Reynolds  numbers,  approximately  5  <  Reo  <  40,  steady  laminar  flow  exists 
with  a  pair  of  symmetric  counter-rotating  vortices  behind  the  cylinder.  This  flow  structure  is 
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seen  in  Figure  3.22(a)  taken  from  VanDyke  (VanDyke,  1982),  which  shows  an  experimental 
visualization  at  Ren  =  26.  The  range  of  Reynolds  numbers  from  approximately  40  <  Ren  < 
190  is  often  termed  the  laminar  vortex  shedding  regime.  The  recirculation  cells  develop 
instabilities,  whose  strength  and  amplification  grow  with  Ren-  These  instabilities  induce 
laminar  vortex  shedding,  also  known  as  the  Karman  vortex  street.  Figure  3.22(b),  also  taken 
from  VanDyke  (VanDyke,  1982)  shows  an  experimental  visualization  of  the  Karman  vortex 
street  at  ReD  =  140. 


Fig.  3.22  -  2D  circular  cylinder  flow:  experimental  flow  visualizations  from  VanDyke 
(VanDyke,  1982).  (a)  ReD  =  24,  (b)  ReD  =  140. 


At  Reynolds  numbers  above  approximately  190,  the  wake  behind  the  cylinder  becomes 
unstable  to  transverse  perturbations  and  develops  3D  structures.  As  a  consequence,  2D  cal¬ 
culations  tend  to  yield  incorrect  values  of  the  flow  parameters  (lift,  drag,  shedding  frequency, 
etc.)  at  these  Reynolds  numbers.  For  this  2D  code  validation  then,  the  maximum  Reynolds 
number  used  was  140  . . .  well  below  the  initiation  of  3D  flow  features. 

A  number  of  flow  quantities  can  be  computed  and  compared  with  existing  numerical  and 
experimental  data.  In  this  study,  the  pressure  drag, 


'Dp 


yu§D 


Jr2ir 
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P cos  8  d6 , 


(3.4) 
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viscous  drag, 


1 


and  total  drag 


Cdv  =  t 


\pReDU$D 


p2n 

fDJo  ‘ 


cow  sin  6  d6 , 


Cd  =  Cdp  +  Cdv  , 


(3.5) 

(3.6) 


were  validated.  In  addition,  at  the  unsteady  Reynolds  numbers  the  non-dimensional  vortex 
shedding  frequency,  or  Strouhal  number, 


St  = 


f_D 

U0' 


(3.7) 


was  also  validated.  In  the  above  equations,  p  is  the  density,  U0  the  freestream  velocity,  D 
the  cylinder  diameter,  /  the  shedding  frequency,  uw  the  wall  vorticity,  and  P  the  pressure. 


3.3.2  Computational  Mesh 

The  computational  domain,  displayed  in  Figure  3.23,  is  rectangular  in  shape  and  ex¬ 
tends  20 D  upstream,  20D  downstream,  and  30 D  laterally  from  the  center  of  the  cylinder. 
Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998)  found  that  Cd,  Cl,  and  St  do  not 
become  independent  of  lateral  domain  size  until  approximately  60 D.  However,  the  relative 
difference  between  these  quantities  is  less  than  4%  when  the  lateral  boundaries  are  moved 
from  120 D  to  30 D. 

The  domain  was  discretized  into  31,000  linear  triangle  elements  with  16,000  nodes  at 
the  vertices.  As  seen  in  Figure  3.24,  elements  are  clustered  in  the  wake  region,  with  the 
characteristic  element  side  length  being  approximately  0.25ZX  Outside  the  wake  region  the 
elements  are  less  dense,  with  characteristic  element  side  lengths  of  0.5 D  at  the  upstream  and 
lateral  extents.  Figure  3.24  shows  the  spatial  discretization  near  the  cylinder.  Elements  are 
clustered  toward  the  cylinder  in  order  to  better  capture  the  initial  development  of  the  shed 
vorticity,  with  element  side  lengths  on  the  cylinder  wall  of  0.028D.  Boundary  layer  wedge- 
type  elements  were  used  very  near  the  cylinder  walls  to  capture  the  boundary  layer,  with  the 
perpendicular  dimension  of  the  first  element  off  the  cylinder  being  0.00515. 

The  boundary  conditions  were  as  follows.  At  the  upstream  boundary,  a  fixed-velocity 
inlet  condition  was  used,  with  u  =  JJq  and  v  =  0.  A  standard  Neumann  velocity  outlet 
condition  was  utilized  at  the  downstream  boundary,  while  slip  wall  conditions  were  used  at 
the  lateral  boundaries.  On  the  cylinder  surface  a  no-slip  wall  condition  was  used. 

The  time  step  was  AtUo/D  =  0.01,  resulting  in  approximately  550  time  steps  per  os¬ 
cillation  period  of  the  lift  force  for  Ren  —  140 — slightly  more  for  the  R&d  =  100  and  60 
cases. 

Finally,  mesh  convergence  was  verified  by  running  fine-mesh  simulations.  The  mesh  used 
for  these  simulations  had  characteristic  element  side  lengths  of  half  that  of  the  coarse  mesh 
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Fig.  3.23  2D  circular  cylinder  flow:  overview  of  the  computational  domain  and  mesh. 


in  the  wake  region,  along  the  cylinder  wall,  and  in  the  boundary  layer.  This  mesh  refinement 
resulted  in  a  total  of  79,000  elements  with  40,000  nodes.  In  addition,  the  time  step  was 
halved  to  AtUo/D  =  0.005  for  these  fine-mesh  simulations. 

3.3.3  Results 

Simulations  were  performed  at  one  steady  (. ReD  =  20)  and  three  unsteady  (ReD  = 
60,  100,  and  140)  Reynolds  numbers.  In  each  case,  potential  flow  was  used  as  the  initial 
condition  and  the  simulation  advanced  in  time  until  a  statistically  stationary  flow  pattern 
was  developed.  In  the  initial  simulations,  the  vortex  shedding  was  allowed  to  develop  without 
introducing  perturbations  to  the  flow.  That  is,  the  shedding  was  induced  by  numerical 
truncation  and  round-off  errors  that  eventually  break  the  symmetry  of  the  solution.  This 
turned  out  to  be  a  very  slow  process.  So,  in  order  to  initiate  vortex  shedding  more  quickly, 
a  time-dependent  rotation  was  applied  to  the  cylinder  for  a  short  time.  The  flow  was  then 
advanced  in  time  until  all  transient  features  exited  the  domain. 

The  qualitative  behavior  of  the  computed  flowfield  is  seen  via  contours  of  constant  vor- 
ticity  shown  in  Figure  3.25.  At  Ren  =  20,  two  stationary  attached  vortices  are  formed 
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Fig.  3.24  -  2D  circular  cylinder  flow:  the  computational  mesh  near  the  cylinder,  (a)  shows 
the  clustering  of  elements  toward  the  cylinder,  while  (b)  shows  the  boundary  layer 
wedge-type  elements  very  near  the  cylinder  wall. 


behind  the  cylinder.  At  the  higher  Reynolds  numbers,  the  wake  behind  the  cylinder  consists 
of  negative  and  positive  vortices  shed  alternately  from  the  upper  and  lower  portions  of  the 
cylinder  surface. 

Quantitative  validation  of  the  flow  solver  is  achieved  by  comparing  Cdp  ,  CDv ,  and  St  with 
published  values.  The  drag  coefficient  benchmarks  are  taken  from  Henderson  (Henderson, 
1995).  Henderson’s  curves  are  fits  to  numerical  data  from  high-order  2D  spectral-element 
simulations  that  produced  results  in  excellent  agreement  with  the  experiments  of  Williamson 
and  Roshko  (Williamson  and  Roshko,  1990).  The  Strouhal  number  benchmark  is  the  so-called 
“universal”  Strouhal  curve  for  a  circular  cylinder  as  put  forth  by  Williamson  (Williamson, 
1988).  This  curve  is  a  fit  through  numerous  experiments  conducted  using  various  techniques. 

The  SFE2D  results  compared  with  the  benchmark  values  are  shown  in  Figure  3.26.  The 
agreement  is  excellent  for  all  Reynolds  numbers.  Also  seen  in  this  figure  is  that  coarse-  and 
fine-mesh  results  are  virtually  indistinguishable  from  one  another.  Consequently,  the  solution 
is  deemed  grid  converged. 

As  a  final  note,  the  SFE2D  solver  results  can  be  compared  with  the  results  of  commercial 
CFD  codes.  The  ReD  —  60  case  is  one  of  a  number  of  test  cases  included  in  the  CFD 
benchmark  summary  compiled  by  Frietas  (Freitas,  1995).  The  comparison  is  summarized 
in  Table  3.2.  Based  on  the  experimental  work  of  TYitton  (Tritton,  1959),  the  measured 
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Fig.  3.25  -  2D  circular  cylinder  flow:  50  instantaneous  contours  of  constant  vorticity  magni¬ 
tude  (0  <  \u)\D/Uq  <  150)  in  the  region  -1.0D  <x<  15.0  D,  -2.0  D  <y<  2.0  D. 
(a)  Reo  =  20,  (b)  ReD  =  60,  (c)  Rep  =  100,  (d)  Re £>  =  140.  Flow  patterns  (a)- 
(d)  correspond  to  the  maximum  value  of  the  lift  coefficient. 

drag  coefficient  is  1.47.  According  to  Schlichting  (Schlichting,  1979),  the  Strouhal  number  is 
measured  to  be  0.14.  Though  the  grid  structure,  grid  size,  and  discretization  strategy  may 
differ,  it  is  useful  to  see  that  SFE2D  provides  results  at  least  as  accurate  as  the  commercial 
CFD  codes. 


ReD 

Fig.  3.26  -  2D  circular  cylinder  flow:  quantitative  comparison  with  published  values  of 

Cpp ,  CDv,  and  St  vs.  Reynolds  number.  - ,  experimental  fit  by  Williamson 

(Williamson,  1988); - ,  numerical  fit  by  Henderson  (Henderson,  1995);  o,  O,  □, 

SFE2D  coarse  mesh;  *,+,  x,  SFE2D  fine  mesh. 


Table  3.2  -  2D  circular  cylinder  flow:  comparison  of  results  from  commercial  CFD  solvers 
(Freitas,  1995)  for  drag  coefficient  Cp  and  Strouhal  Number  St  at  Re  =  60.  The 
experimental  values  for  Cp  and  St  are  taken  from  TYitton  (Tritton,  1959)  and 


Lateral/inlet  /  outlet 
distance 

Grid  resolution 

Cp 

St 

Experiment 

SFE2D 

30D/20D/20D 

16,000  nodes  (uns) 

1.47 

1.405 

0.14 

0.137 

FLOW-3D 

6.5D/2.5D/22D 

198  x  80  (cartesian) 

1.77 

0.15 

FLOTRAN 

8D/~  10 D/~  20D 

21,498  nodes  (uns) 

1.44 

NA 

FLUENT 

5D/NA/20D 

101  x  201  (O-H) 

1.567 

0.15 

CFDS-FLOW3D 

NA/3.5D/9.5D 

3,384  nodes  (block-str) 

0.037 

NA 

NISA/3D-FLUID 

10D/8D/20D 

3,603  nodes  (uns) 

1.343 

NA 
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4.  3D  LAMINAR  FLOW  SOLVER 

In  this  chapter,  extension  of  the  2D  stabilized  FE  solver  derived  in  Chapter  2  to  a 
3D  FE/spectral  solver  is  explained.  Taking  inspiration  from  Henderson  (Henderson,  1999), 
this  solver  utilizes  an  in-plane  SUPG/PSPG  FEM  discretization  and  a  collocated  spectral 
discretization  in  the  transverse  direction  to  recast  the  3D  problem  into  a  set  of  coupled  2D 
problems  in  Fourier  space. 

First,  the  governing  equations  are  presented  in  a  form  that  separates  the  in-plane  and 
transverse  components.  The  discretization  of  these  equations,  first  in  the  plane  and  then  in 
the  transverse  direction,  is  explained.  A  description  of  the  special  treatment  of  the  convective 
terms  (including  SUPG  terms)  required  to  reduce  computation  cost  follows.  The  paralleliza¬ 
tion  approach  is  discussed  and,  finally,  the  details  of  the  solver  implementation  are  presented 
by  means  of  an  in-depth  algorithm  description. 


4.1  Governing  Equations 

Consider  laminar,  incompressible,  isothermal  flow  in  the  absence  of  body  forces.  The 
governing  Navier-Stokes  equations  are 

dvi 

-q£  +  (u  •  V)u  =  -Vp  +  uV2u  (4.1) 

V-«  =  0,  (4.2) 

where  u  is  the  3D  velocity  vector,  p  is  the  kinematic  pressure  (pressure  divided  by  density) 
and  v  is  the  kinematic  viscosity. 

The  solution  technique  developed  in  this  work  employs  a  different  discretization  method 
for  the  in-plane  and  transverse  directions.  Consequently,  the  governing  equations  are  now 
written  in  a  form  that  separates  these  components.  In  order  to  accomplish  this,  the  following 
notation  is  adopted  for  the  remainder  of  this  work. 


(z,  y)  in-plane  coordinates 
2  transverse  coordinate 
u  2D  velocity  vector,  (u,  v) 
w  transverse  velocity  component 
V  2D  gradient  operator,  (d/dx,d/dy) 

Equations  (4.1)  and  (4.2)  can  now  be  written  as 

%  +  (“  '  VK  +  =  - Vp  +  *  (v2  +  ~ju  (4.3) 
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dw  dw 

—  +  {u.V)w  +  w- 

~  _  dw 

V'u  +  Tz 


dp 

dz 


+  u 


w 


0. 


(4.4) 

(4.5) 


4.2  Temporal  Discretization 

The  temporal  discretization  of  the  3D  solver  is  identical  to  that  of  the  2D  solver.  For 
the  pressure  and  diffusion  terms,  a  second-order  Crank-Nicolson  method  is  applied.  In  terms 
of  a  simple  example  first-order  ordinary  differential  equation  (ODE) 

|  =  (4.6) 


this  method  results  in  the  discrete  expression 

fn+1  ~  f 


At 


=  A 


fn+l  +  fr 


(4.7) 


where  []n+1  denotes  the  new  time  step  and  []n  the  current  time  step. 

It  is  apparent  from  (4.7)  that  the  Crank-Nicolson  method  is  based  on  central  differencing 
(centered  about  time  n  +  |)  and  hence  is  second-order  accurate.  Due  to  the  implicitness  of 
the  scheme,  it  is  mathematically  deemed  unconditionally  stable  for  all  time  step  values, 
meaning  that  perturbations  are  not  amplified  with  time.  In  practice,  though,  it  is  found  that 
oscillations  can  occur  for  sufficiently  large  time  step  values. 


The  convective  terms  are  treated  using  a  second-order  explicit  Adams-Bashforth  method. 
Again,  in  terms  of  the  example  first-order  ODE  (4.6),  the  method  results  in  the  discrete 
expression 

fn+ 1  -  r 


At 


=  A  -/ 


-i  r 
2J 


(4.8) 


As  with  all  explicit  schemes,  the  Adams-Bashforth  method  imposes  a  stability  limit  on  the 
time  step  size.  As  mentioned  in  Chapter  2,  although  this  can  be  a  serious  limitation  in  some 
cases,  an  explicit  treatment  of  the  convection  terms  is  acceptable  (and  even  desirable)  in  this 
work  for  three  primary  reasons. 


•  The  convection  terms  are  non-linear.  Therefore,  if  treated  explicitly,  the  system  is  linear 
within  each  time  step  and  there  is  no  need  for  multiple  iterations  within  a  time  step  or 
the  necessity  to  calculate  expensive  Jacobian  matrices. 

•  The  ultimate  intent  of  this  solver  is  to  perform  LES.  Since  LES  requires  very  small  time 
steps  anyway,  the  time  step  limitation  due  to  the  explicit  convection  treatment  is  less 
important. 
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As  will  be  shown  later,  explicit  treatment  of  the  non-linear  terms  decouples  the  Fourier 
modes  within  each  time  step  when  the  FE/spectral  discretization  is  used. 


4.3  Spectral  Methods 

Spectral  methods  are  characterized  by  the  expansion  of  the  solution  in  terms  of  global, 
orthogonal  polynomials.  This  global  expansion,  as  opposed  to  local  expansions  used  in  the 
FDM,  FVM,  and  FEM,  allows  a  much  faster  (exponential)  convergence  rate,  which  is  the 
main  advantage  of  the  method. 

Numerical  spectral  methods  for  PDEs  were  originally  developed  by  meteorologists  some 
half  a  century  ago  (Silberman,  1954).  However,  nonlinearities  present  a  considerable  com¬ 
putational  expense,  which  was  a  severe  drawback  to  the  method  until  alternative  “pseudo- 
spectral”  transform  methods  (to  be  discussed  shortly)  were  developed  (Orzag,  1969;  Eliasen 
et  al.,  1970). 

The  main  difficulty  in  applying  spectral  methods  to  general  fluid  dynamics  problems 
arises  in  implementing  boundary  conditions  on  anything  other  than  simple  geometries.  To 
work  around  this  problem,  spectral  multi-domain  (or  spectral  element)  techniques  have  been 
developed  and  remain  a  current  area  of  intense  research  (Patera,  1984;  Henderson,  1999).  In 
these  methods,  the  full,  complex  domain  is  partitioned  into  subdomains  of  simple  geometry 
and  a  spectral  method  applied  in  each  subdomain. 

A  number  of  different  spectral  methods  have  been  developed.  The  reader  is  referred  to 
Canuto  et  al.  (Canuto  et  al.,  1988)  or  Hussaini  and  Zang  (Hussaini  and  Zang,  1987)  for  a 
review  of  spectral  methods  and  their  application  to  fluid  dynamics.  In  this  work,  the  flow 
is  assumed  to  be  periodic  in  the  transverse  direction.  Therefore,  a  simple  collocated  Fourier 
spectral  method,  discussed  later  in  this  chapter,  is  well  suited. 

4.4  Finite-Element/Spectral  Discretization 

Two  options  exist  in  performing  the  spatial  discretization  of  equations  (4.3)-(4.5).  One 
can  first  perform  the  Fourier  decomposition  in  the  periodic  direction  and  then  the  FEM 
discretization  in  the  plane,  or  vice-versa.  For  a  pure  Galerkin  or  Galerkin/PSPG  formulation, 
the  two  approaches  produce  the  same  result.  However,  the  use  of  convective  stabilization 
(SUPG)  requires  that  the  in-plane  discretization  be  performed  first,  and  then  the  Fourier 
decomposition  in  the  transverse  direction. 

The  reason  for  this  can  immediately  be  seen  if  the  Fourier  decomposition  is  performed 
first.  To  do  so,  we  assume  the  flow  to  be  periodic  in  the  transverse  ( z )  direction  and  known 
at  discrete  points  in  space.  Consequently,  the  flowfield  can  be  developed  as  a  discrete  Fourier 
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sum.  That  is,  any  quantity  qn(x,y,t )  =  q(x,y,zn,t)  is  expressed  as 


N- 1 


qn{x,  V,t)  =  jj^2  Qk{x,  V,  t)e^kn 
k= 0 


(4.9) 


or,  equivalently 


N- 1 


q(x,  y ,  zn,  i)  =  ^  9fe(®,  y,  fk,  t)e™ 


'Zn 


(4.10) 


k= 0 


where  /  =  N  is  the  number  of  discrete  samples  in  the  set,  and  L  is  the  dimension  of  the 
transverse  domain.  The  equations  for  the  discrete  Fourier  modes  qk  are  obtained  by  taking 
the  discrete  Fourier  transform  (DFT)  of  the  governing  equations  (4.3)-(4.5).  The  DFT  is 
defined  as 

JV-l 

qk{x,y,t)  =  Y^Qn(x,y,t)e~2^Rk.  (4.11) 

71=0 


Upon  execution  of  the  transform,  the  governing  equations  for  the  Fourier  coefficients  of 
the  velocity  vector  and  pressure  are 


duk 

dt 


dwk 

dt 


N- 1 

+  £ 

71=0 

’  dun' 

(Un  ■  V)Un  +  Wn-rj^ 

2-kIti  7, 

e 

JV-l 

n= 0  ■ 

N. _ 

hk 

' ( V7\  ,  9Wn 

{un  vjWn  T  wn 

27t  In  jl 

e  * 

- y 

hzk 

e-^k  =  _  ypk  +  u  ^y2  _  (27rfc/L)2)  4  (4.12) 


-  Vpk  +  v  (v2  -  (27 rfc/L)2)  4 


V  ■  u  +  (2nIk/L)wk  =  0. 


(4.13) 

(4.14) 


Note  that  the  convective  terms  4  and  hzk  in  the  above  expressions  have  not  yet  been 
evaluated.  Now,  because  the  convective  terms  are  nonlinear,  their  Fourier  transforms  cannot 
be  written  in  a  form  (uk  ■  V)uk  and  (uk  ■  V)wk  necessary  to  be  able  to  define  the  SUPG  terms 
as  described  in  Section  2.2.2. 

So,  one  must  first  define  a  SUPG/PSPG  semi-discretization  for  the  original  3D  equations 
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(4-3)— (4.5).  Following  the  procedure  given  in  Chapter  2,  we  derive 
f  at  ( /_  d2u\ 

?  [Jo.  ‘  (aF +  (“ '  v)“  +  wTz  ■  ,,3?) 


+  (u  •  V)u  +  - i/— 

az  dz< 


p  dQ,e+  I  ( VA*  ■  i/Vv )  dQe+ 


TSUPG  I  (U  ■  1 

m)  | 

oT  +  (u-V)v  +  w 

ov  op 

Q~  +  ^ - V 

|  dfle 

\  ot 

oz  oy 

V  dz*J) 

+  (u-V)W  +  ^  +  ^ 

02  02 


i)  “57  +  («  *  V)w  + 


p| 


*/  V2u/  + 


?uW,if+ 

rpsPG  [  ^7 Ni  •  [  —  +  (u  •  V)u  +  +  Vp-  u 


where  the  SUPG  and  PSPG  stabilization  terms  have  been  identified  via  an  underbrace.  The 
underlined  terms  in  the  above  expressions  vanish  for  a  PI  element  approximation. 

Now  a  Fourier  decomposition  in  the  2-direction  is  performed  at  the  discrete  level  (equa¬ 
tions  (4.15)— (4.18)).  After  performing  the  DFT,  one  obtains  the  following: 


?[/., 


-  f  dN(  „ 

h  ( hx  k)i  /  "TJ  Pk  dGe  + 

Jne  ox 

I  u(vNi-Vuk  +  (2Trk/L)2Niuk)dQe  +  (STSUPGXk)i  =0  (4.19) 
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/  u(vNi-  Vvk  +  (2nk/L)2Nivk)dne  +  (STsupayk)i  = 

J  fig 

T,  \f  Nt-jjr  &le  +  +  “7“  [  N,ptdac+ 

~  L^e  dt  L  JOe 

j  u(yNi-Wwk  +  {2nk/L)2Niwk^dne  +  {STSUPGzk)i  = 

Y\[  iV,(V-4)^e  +  ^  [  NiwkdQe+ 
e  Une  L  Jne 

Tpspa  J  ViVi  •  +  Vpfc  +  u(2nk/L)2uk)  dne  +  ( STPSPack)i  = 


where  the  convective  terms  are 


f  (  fdu N 

(krk)i  =  Y  Ni[(Un'  V)Un  +  wn  f  ^ 
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The  PSPG  stabilization  of  the  convective  terms  is 

(ST„„«),  =  fwc/n  VNi-((u,-V)un  +  wn(^je-^“‘dSle, 

and  the  SUPG  convective  stabilization  terms  are 

N—l  (  r  ( Q 

(ST SUPG  xk)i  =  Ys  |  Tsupg  J ^  ■  ViVj  y-gjT  +  (tin  ■  V)un+ 

~  (§)*§-©)  J*)--’1 
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As  before,  the  evaluation  of  the  transformed  convective  terms  has  not  been  performed. 
These  terms  are  treated  explicitly  in  time  and  using  a  pseudo-spectral  approach  described  in 
Section  4.5. 

4.4.1  Nodal  Matrix  Equation 

As  with  the  2D  solver,  a  node-by-node  ordering  of  the  DOFs  is  used,  in  which  all  velocity 
and  pressure  unknowns  at  a  node  are  ordered  one  after  the  other  to  form  a  small  8-component 
vector.  This  ordering  results  in  a  block-structured  system  matrix.  We  can  now  speak  in  terms 
of  the  small  characteristic  matrix  equation  associated  with  each  node.  The  resulting  nodal 
matrix  equation  is  similar  to  its  2D  counterpart,  with  two  exceptions.  First,  there  exist  some 
new  “Helmholtz  coefficients,”  A\  and  A2,  that  arise  due  to  derivatives  taken  in  the  z-direction. 
Also,  the  nodal  stiffness  matrix  is  no  longer  3x3,  but  rather  8x8.  This  is  a  consequence  of 
the  additional  ^-velocity  component  as  well  as  that  the  Fourier  coefficients  have  an  imaginary 
and  a  real  component  that  must  both  be  computed. 

To  assist  in  implementation,  the  matrix  equation  is  written  in  terms  of  the  change  in 
flow  parameters  during  each  time  step, 


8uk  =  v%+1  -  uk 

(4.30) 

8wk  -  w%+l  -  wk 

(4.31) 

6Pk  =  Pk+1-pk, 

(4.32) 

rather  than  the  flow  parameters  themselves,  uk,  wk,  and  pk.  Using  3?  to  denote  the  real 
component  and  3  to  denote  the  imaginary,  the  3D  nodal  stiffness  matrix,  with  unknowns 
ordered  ^(8uk)j,  5R( Svk)j ,  %{Svk)j,  &( 8wk)j ,  Q(Swk)jf  8( 8p k)Jt  %(6pk)j}  is 

'  tea  o  o  o  o  o  \xqij  o  ]  r »($«*)/ 

o  /Cy  0  0  0  0  0  \xqij  9( Suk)j 
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0  0  0  Kij  0  0  0  %8vk)j 

o  o  0  0  Kii  0  0  A2ij  »(&£*),-  ' 

0  0  0  0  0  Kij  -A2ij  0  Q(Swk)j 

xxQij  0  xyQij  0  0  A2ij  "Sij  0  3 l(Spk)j 

.  0  yxQij  0  yyQa-A^  o  o  <%-J 

—{*}>§{*}  :-m:1  «*> 
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where 
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The  ‘Helmholtz’  and  related  coefficients  are 


Axij  =  —ufiirk/L)'2  mi 


AlxSij  =  -ufink/L)2  mxSij 
AiySij  =  —v(2Ttk/L)2mySij 
A2ij  =  (27t  k/L)mij. 


(AM) 


(4.35) 

(4.36) 

(4.37) 

(4.38) 

(4.39) 


(4.40) 

(4.41) 

(4.42) 

(4.43) 

(4.44) 


(4.45) 

(4.46) 

(4.47) 

(4.48) 


The  other  coefficients  appearing  in  (4.33)  are  the  same  as  for  the  2D  case.  That  is,  the 
Galerkin  terms  are 


mu  = 


[  NiNidtte 


(4.49) 


h  -  vf  WBN,  BN,  BN, 

13  Jne  dx  dx  dy  dy  e 

(4.50) 

'*> =  J„_  ~diN]in‘ 

(4.51) 

"9ii = L  a^Ni  dae 

(4.52) 

and  the  PSPG  terms  are 

(4.53) 

=  f ^NjiSl, 

(4.54) 

k‘s^LLT^N>dn' 

(4.55) 

l“sn=J  T'j-^Njdn, 

(4.56) 

f  SNtdNi  dNidN, 

’  rlra  Jo.  Sx  dx  +  dy  dy  <“V 

(4.57) 

4.4.2  Analytical  Coefficient  Evaluation 

Because  linear  triangle  elements  are  used,  each  of  the  coefficients  appearing  in  the  system 
matrix  (4.33)  can  be  evaluated  analytically.  The  ‘Helmholtz’  and  related  coefficients  become 

A«  =  -K2^/i)2§(l  +  <5ij) 

(4.58) 

'4‘ISy  =  -r„„t/(2  xk/L/Xi 

(4.59) 

A'"S,i  =  -TFSFaH2  xk/Lf-% 

(4.60) 

A2ij  =  (2nk/L)^(l+Slj). 

(4.61) 

The  Galerkin  terms  are 

St 

mij  =  j2  (1  +  5ij) 

(4.62) 

hj  =  (xniXTij  +  vnivnj) 

(4.63) 

XTL- 

Xq «  =  "T 

(4.64) 

V  V«i 

^  =  “T’ 

(4.65) 

and  the  PSPG  terms  are 

mi  c. .  _ _  *ni 

'PSPG  g 

(4.66) 
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kxSij  =  0 

^Sij  =  0 

qSij  =  Tpspa~{xnixnj  +  Vrij), 

where  ST  is  the  area  of  the  triangle  element,  and  6  is  the  Kronecker  delta  operator.  The 
PSPG  stabilization  of  the  diffusion  terms,  kxSij  and  kySlj}  are  zeradue  to  the  inability  of  the 
Pl/Pl  element  to  possess  non-zero  second  derivatives. 

4.4.3  Fourier  Symmetry 

The  need  to  solve  for  both  real  and  complex  components  of  the  Fourier  coefficients  adds 
much  additional  computational  cost  and  detracts  from  the  appeal  of  the  spectral  decompo¬ 
sition.  However,  this  cost  is  compensated  for  in  two  ways.  First,  the  higher  accuracy  of  the 
spectral  decomposition  allows  one  to  employ  fewer  nodes  in  the  transverse  direction  than  if 
a  FE  or  FD  scheme  were  used.  Also,  because  it  is  known  that  the  physical-space  variables 
are  purely  real,  there  are  simplifications  that  can  be  made  to  reduce  the  amount  of  com¬ 
putation  required  to  solve  the  system.  These  simplifications  come  about  because  when  the 
physical-space  values,  gk,  are  purely  real,  the  following  symmetry  appears  in  Fourier  space: 

9(-f)  =  W)\\  <4-71) 

where  []*  denotes  the  complex  conjugate.  Consequently,  the  matrix  equation  need  only  be 
solved  for  the  first  N/2  +  1  Fourier  modes,  and  the  other  modes  can  be  evaluated  from  the 
symmetry  condition.  In  addition,  modes  0  and  N/2  are  purely  real  and  the  imaginary  com¬ 
ponents  can  therefore  be  eliminated  from  the  system.  These  simplifications  are  summarized 

in  Figure  4.1. 


(4.67) 

(4.68) 

(4.69) 

(4.70) 


4.5  Treatment  of  the  Convective  Terms 

4.5.1  Temporal  Treatment 

As  with  the  2D  solver,  the  convective  terms  are  treated  explicitly  in  the  3D  solver.  In 
addition  to  removing  the  nonlinearity  from  the  matrix  equation  within  each  time  step,  it  is 
seen  from  equations  (4.12)-(4.14)  or  (4.19)-(4.22)  that  the  Fourier  modes  are  also  completely 
decoupled  within  each  time  step  since  all  coupling  takes  place  in  the  convective  terms.  This 
is  a  monumental  computational  advantage  because  it  converts  the  3D  problem  into  a  series 
of  independent  2D  problems.  The  extension  of  the  2D  code  developed  in  Chapter  2  to  3 
is  rather  direct  because  the  basic  building  blocks  (2D  stabilized  FE  solvers)  are  already  m 

place. 
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k  =  0 

— >  real 

k  =  1 

k  =  N/  2-lj 

>  — >•  real  +  imaginary 

k  =  N/2 
k  =  N/  2-t-T 

— ►  real 

k  =  N-  1 

/ 

9k  = 

Fig.  4.1  -  Illustration  of  the  symmetry  properties 
physical-space  values  are  purely  real. 


of  the  Fourier  coefficients  when  the 


4.5.2  Pseudo-Spectral  Approach 

Thus  far  the  evaluation  of  the  convective  terms  has  not  been  discussed.  The  most 
straight-forward  approach  is  to  directly  evaluate  the  discrete  sums  appearing  in  (4.23)-(4.29). 
However,  this  approach  is  computationally  expensive  because  the  discretization  of  at  least 
O(N)  products  must  be  evaluated  for  each  Fourier  component,  where  N  is  the  number  of 
discrete  Fourier  modes.  This  results  in  0(N 2)  product  evaluations  for  each  degree  of  freedom. 

Alternatively,  a  pseudo-spectral  approach  first  introduced  by  Orzag  (Orzag,  1969,  1971a) 
is  employed.  In  this  approach,  the  weighted  residual  form  of  the  convective  terms  are  eval¬ 
uated  in  physical  space  and  then  transformed  back  into  Fourier  space  for  use  in  the  next 
time  step.  Referring  to  Figure  4.2,  knowing  the  Fourier  components  of  the  velocity  DOFs 
(fit one  can  easily  obtain  the  physical-space  quantities  (un,wn)f^  at  N  equi-spaced 
transverse  points  via  a  discrete  inverse  Fourier  transform.  Knowing  the  physical  space  veloc¬ 
ity  DOFs,  the  convective  terms  ( Hn ,  Hzn,  Hpn)f -  can  then  be  computed  using  a  combination 
of  finite-elements  and  finite-differences.  Upon  evaluation  of  the  convective  terms  at  each  2D 
plane,  their  corresponding  Fourier  coefficients  {Hk,  Hzk,  Hpk)f^  are  computed  using  a  DFT. 
Assuming  there  exist  2n  transverse  planes,  the  terms  can  now  evaluated  at  a  cost  of  three  fast 
Fourier  transforms  (FFTs)  and  0(n)  products,  resulting  in  0(N  log2  N)  product  evaluations 
for  each  DOF. 

The  procedure  for  calculating  the  physical  space  convective  terms  ( Hn ,  Hzn,  Hpn)i  know¬ 
ing  the  velocity  DOFs  at  N  equi-spaced  transverse  planes  is  now  explained  in  detail.  First, 
the  terms  that  must  be  evaluated  are 


(Hn)i  —  (Hxn,  Hyn)i, 


(4.72) 
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Physical  Space 


Fourier  Space 


Fig.  4.2  -  Illustration  of  the  pseudo-spectral  approach  utilized  in  evaluation  of  the  nonlinear 
terms. 


( Hzn)i  —  (^zn)t  {STSupq  zn)i 

{Hpn'ji  (STpgpgCn^i. 

The  components  of  the  above  expressions  are 
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(4.75) 

(4.76) 
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(4.78) 
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(4.80) 
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(4.81) 


(4.82) 


(4.83) 


The  transverse  derivatives  ( d/dz)n ,  (d2/dz2)n  are  calculated  using  second-order  central 
differences.  That  is, 


/  _  fright  fleft 

\dzjn  2Az 

( —L\  —  fri9ht  ~  ^/n  +  fleft 

\dz2J  B  Az’ 


(4.84) 

(4.85) 


where  Az  is  the  spacing  between  the  2D  planes, 


Az  = 


(4.86) 


and  L  is  the  transverse  dimension  of  the  computational  domain  and  N  is  the  number  of 
Fourier  modes.  The  values  of  /ie/t  and  fright  are 


fleft  fn—h  fright  —  fn+ 1 


(4.87) 


at  all  but  the  boundary  planes  n  =  0  and  n  =  N  —  1 .  At  these  planes  the  periodicity  condition 
results  in 

fleft  f N — 1  j  fright  ~  }l  (4.88) 

for  the  n  =  0  plane  and 

fleft  fN—2i  fright  =  fo  (4.89) 

for  the  n  =  N  —  1  plane.  These  spanwise  derivative  values  calculated  from  (4.84)  and  (4.85) 
are  then  treated  as  any  other  nodal  DOF  in  computing  the  weighted  residual  form  of  the 
convective  terms. 
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4.5.3  Nodal  Convective  Load  Vector 

Referring  to  (4.33),  the  characteristic  nodal  matrix  equation  associated  with  each  node, 
the  nodal  convective  load  vector  is 


N- 1 


71—0 


27r/n  K 

’~FTK 


(4.90) 


where 


r  (Hxn)i ) 


(4.91) 


and  (Hxn)i,  (Hyn)i,  ( Hzn)i ,  and  (Hpn)i  are  given  in  (4.73)-(4.76). 

4.5.4  Analytical  Coefficient  Evaluation 

Because  linear  triangle  elements  are  used,  each  of  the  coefficients  appearing  in  the  convec¬ 
tive  load  vector  (4.91)  can  be  easily  evaluated  analytically.  After  integration  of  (4.77)-(4.83) 
one  obtains 


(hxn)i  =  ‘  ^(l  +  hk)  ( unkxnj  4-  vnk ynj )  + 


ST 


max(l,  2 [5ij  +  5ik  +  Sjk)]wnk 


nj 


( hyn)i  =  ~-(l  +  Sik)  (unkxrij  +  vnkynj)  + 

S'j'  f 

max[l,  2 (5ij  +  5ik  +  Sjk)]wnk  ( — 


nj 


Wn 


{hzn)i  =  -^-{l  +  8ik)(unk*nj  +  vnkvnj)  + 

■gQ-  max[l,  2(5ij  -{-  8ik  -I-  8jk)\wnk  ( 

1  r  .  Siij 

24(1  +  ^)aT+ 


nj 


{STSUPGxn)i  "^sUPG^nk  “t"  Vnk  Tli) 

^^-unj{unk  nj  +  vnkyrij )  +  — (1  +  5jk)  1 


12  ST 
xn 


v  .  / d2u\ 

i2&Pnj~24{1  +  5jk){dz>)nj. 


(4.92) 


(4.93) 


(4.94) 


(4.95) 


79 


1  ,,  .  .  (Ju,- 

a(1+{it)Ar+ 


i^^SUPG  V »)*  Tsc/pc(^^fc  "J“  ^nfc  ^^Ij) 

Y^-UniKfc  Xnj  +  Unit  %•)  +  ^(1  +  fyfe) 


yrij  */„  .  .  /W\ 

?_Pni  0/1  (-1  +  ^'fc)  (^^2  J 


12Sr'"‘*'  24 

{^'^'supg  zn)i  ^sc/Pc(W”fc  4"  ^nk^^i) 


1  „  -  .  foa,- 

m(1+^‘)at+ 


24(1  +  ^)l.^J„,  24(1  +  {'"\a?;nlJ 


1 


{STpspcCn)i  ~  tpspg  2^  (Wjl nj  "I"  Vk  Vnj)(unj  X'T>-i  +  Vj  V7li)-{- 

s" *«(•*(£)./>*(£).,)• 

where  Sr  is  the  area  of  the  triangle  element  and  5  is  the  Kronecker  delta  operator. 


(4.96) 


+  unfc  %)  +  ~{l  +  Sjk)  (wj^J  +  (4.97) 
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4.6  Parallelization 

Though  LES  is  a  compromise  between  RANS  and  DNS,  it  is  still  rather  computationally 
expensive  due  to  the  fine  mesh  and  small  time  steps  required.  In  the  case  of  the  solver 
developed  in  this  work,  where  an  unstructured  mesh  is  utilized,  parallelization  is  required  in 
order  to  reduce  the  computation  time  to  a  reasonable  level. 

In  this  section,  the  parallelization  theory  and  implementation  are  discussed.  There  are 
three  sections  of  the  algorithm  that  are  parallelized.  Referring  to  Figure  4.10,  these  sections 
are  1)  evaluation  of  Fourier  coefficients,  2)  evaluation  of  convective  terms,  and  3)  transforma¬ 
tion  of  flowfield  information  to/from  Fourier  space.  The  parallelization  theory  as  it  applies  to 
each  of  these  sections  will  now  be  presented.  Following  which,  the  implementation  in  terms 
of  OpenMP  on  a  shared-memory  computer  is  discussed. 

4.6.1  Parallelization  Theory 

4.6. 1.1  Evaluation  of  Fourier  Coefficients 

The  majority  of  the  computation  time  is  spent  setting  up  and  solving  the  linear  systems 
that  arise  for  each  of  the  Fourier  modes  at  each  time  step.  Consequently,  parallelization  of 
this  portion  of  the  algorithm  is  most  important  and  beneficial. 

The  typical  approach  taken  to  parallelize  solvers  of  this  type  is  to  partition  the  compu- 
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tational  mesh  in  physical  space,  as  shown  in  Figure  4.3(a).  That  is,  the  mesh  is  divided  into 
NP  subregions  (where  NP  is  the  number  of  processors  available),  and  the  work  associated 
with  each  subregion  (a  submatrix  in  the  overall  system  matrix)  assigned  to  different  threads. 
One  drawback  to  this  approach  is  that  the  partitioning  procedure  can  be  complicated  and 
expensive.  Another  is  that  the  convergence  characteristics  and  accuracy  can  change  depend¬ 
ing  upon  the  partitioning.  Yet  another  is  that  the  submatrices  are  coupled  to  one  another 
through  the  boundaries  of  the  subregions,  and  information  must  therefore  be  passed  between 
processors  during  the  solution  of  the  system. 


Fig.  4.3  -  Approaches  to  partitioning  for  parallelization:  (a)  partitioning  in  physical  space, 
(b)  partitioning  in  Fourier  space. 

Because  the  discretization  procedure  used  in  this  work  produces  an  independent  matrix 
equation  for  each  Fourier  mode  rather  than  a  single  large  coupled  system,  a  novel  approach 
is  taken.  Rather  than  partitioning  in  physical  space,  the  problem  is  partitioned  in  Fourier 
space.  As  illustrated  in  Figure  4.3(b),  this  means  that  each  Fourier  mode  is  treated  as  a 
different  thread  and  can  be  calculated  on  a  different  processor.  It  is  important  to  understand 
that  rather  than  using  a  parallel  matrix  equation  solver  as  is  typically  done,  NP  instances  of 
a  serial  matrix  solver,  each  solving  a  different  Fourier  mode,  are  called  simultaneously. 

This  approach  is  free  from  the  drawbacks  of  the  physical  space  partitioning  mentioned 
above.  First,  the  partitioning  process  is  no  longer  complicated  or  expensive.  In  fact,  it  is 
virtually  free  because  it  occurs  as  a  natural  by-product  of  the  FE/spectral  discretization 
technique.  In  addition,  because  the  partitions  (Fourier  modes)  are  completely  decoupled 
within  each  time  step,  accuracy  and  convergence  are  not  affected  by  parallelization.  Also  due 
to  the  decoupling  of  the  partitions,  there  is  no  need  for  communication  between  processors 
during  the  system  solution. 
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The  parallelization  technique  is  not  without  its  drawbacks,  however.  As  mentioned 
previously,  though  the  linear  system  solution  takes  place  in  Fourier  space,  the  convective 
terms  are  treated  with  a  pseudo-spectral  approach,  meaning  these  terms  are  evaluated  in 
physical  space.  Consequently,  the  flowfield  must  first  be  passed  through  an  inverse  DFT. 
This  requires  that  at  the  end  of  each  time  step,  all  flowfield  information  from  all  Fourier 
modes  (all  processors)  is  required  in  order  to  calculate  the  convective  terms  needed  for  the 
following  time  step.  Passing  this  amount  of  data  between  processors  on  a  distributed-memory 
machine  is  cost-prohibitive.  Another  drawback  is  that  load  imbalancing  can  be  an  issue.  Load 
imbalancing  occurs  because  a  parallel  region  will  not  complete  until  all  processors  finish  their 
work.  If  some  processors  have  more  work  to  complete  than  others,  performance  suffers.  In 
this  algorithm,  load  imbalancing  is  a  problem  because  there  are  relatively  few  “work  chunks” 
(Fourier  modes)  with  possibly  varying  computational  costs.  That  is,  there  are  only  N/2  +  1 
linear  systems  to  solve  (where  N  is  the  number  of  Fourier  modes)  and,  depending  upon  the 
level  to  which  a  particular  mode  is  excited,  a  widely  varying  amount  of  time  may  be  required 
to  solve  each  system.  Both  of  these  drawbacks  must  be  addressed  in  order  to  successfully 
implement  this  parallelization  scheme. 

4.6. 1.2  Evaluation  of  the  Convective  Terms 

The  parallelization  concept  for  this  section  of  the  solver  is  similar  to  that  of  the  Fourier 
coefficient  evaluation,  except  that  the  computations  are  no  longer  in  Fourier  space,  but  in 
physical  space.  The  problem  is  partitioned  by  considering  each  2D  plane  as  a  separate  task. 
These  tasks  are  coupled  in  that  values  at  the  two  neighboring  planes  are  required  for  the 
FD  evaluation  of  the  transverse  derivatives.  However,  this  coupling  is  of  an  explicit  nature 
because  the  evaluation  of  the  convective  terms  at  one  plane  is  only  dependent  on  previously 
known  values  at  the  neighboring  planes,  not  the  evaluated  convective  terms  at  those  planes. 
Consequently,  any  plane  can  be  solved  by  any  thread  and  in  any  order. 

4.6. 1.3  Forward/Inverse  Fourier  Transform 

Each  node  in  the  2D  plane  results  in  an  independent  FFT/IFFT  procedure.  Though 
parallel  FFT  algorithms  are  common,  their  use  is  inappropriate  for  this  application.  Instead, 
for  this  section  of  the  solver  the  work  is  partitioned  in  the  2D  plane,  similar  to  Figure  4.3(a). 
In  this  way,  NP  instances  of  a  serial  FFT/IFFT  routine  are  called  simultaneously,  and  the 
transforms  are  preformed  in  a  node-by-node  manner  within  each  partition.  This  procedure 
does  not  suffer  from  any  of  the  accuracy  issues  mentioned  previously  for  physical  space  parti¬ 
tioning  because  each  node  can  be  transformed  independently.  Also,  there  are  no  complicated 
partitioning  algorithms  required.  In  fact,  the  assignment  of  a  particular  node  to  a  particular 
thread  can  be  completely  random. . .  the  partitions  need  not  be  contiguous  blocks. 


82 


4.6.2  Parallelization  Implementation 

The  parallelization  is  implemented  using  the  OpenMP  model  (Still  et  ah,  1998;  Adve 
et  ah,  1999;  Throop,  1999)  because,  as  will  be  shown  in  this  section,  its  features  allow  the 
drawbacks  discussed  above  (especially  in  the  Fourier  coefficient  evaluation  section)  to  be 
addressed.  OpenMP  is  a  collection  of  directives,  runtime  library  routines,  and  environment 
variables  for  shared-memory  parallelism  using  Fortran77/90/95  or  C/C++.  It  is  designed  to 
be  a  vendor-independent  standard,  and  its  development  is  overseen  by  a  board  consisting  of 
members  representing  most  of  the  major  computer  manufacturers.  Industry  support  for  this 
standard  has  been  shown  from  companies  such  as  Compaq,  Hewlett-Packard,  Intel,  IBM, 
SiliconGraphics,  and  Sun.  Consequently,  it  is  quickly  becoming  the  standard  for  shared- 
memory  parallelism. 

The  difference  between  a  shared-memory  computer  and  a  distributed-memory  system  is 
important  to  understand  when  developing  a  parallelization  strategy.  The  primary  distinction 
is  that  for  a  shared-memory  computer,  all  of  the  processors  are  able  to  directly  access  all 
of  the  memory  in  the  machine.  Though  there  are  a  number  of  architectures  designed  for 
shared-memory  parallelism,  from  the  OpenMP  programmer’s  point  of  view,  these  details  are 
unimportant  and  the  system  can  be  understood  sufficiently  from  the  simplified  architecture 
shown  in  Figure  4.4.  The  exception  to  this  is  in  trying  to  tune  a  code  for  optimal  performance 
on  a  particular  computer;  in  this  case  the  detailed  architecture  is  important. 


Memory 


Fig.  4.4  -  A  simplified  model  of  the  shared-memory  computer  architecture,  where  each  of  the 
n  processors  has  direct  access  to  all  system  memory. 

Distributed-memory  systems  are  the  alternative  to  shared-memory  configurations.  In 
these  systems,  each  processor  is  only  capable  of  directly  addressing  memory  that  is  physically 
associated  with  it,  as  seen  from  the  simplified  architecture  shown  in  Figure  4.5.  Because 
each  processor  has  its  own  independent  memory  system,  the  programmer  must  manage  how 
the  program  data  is  distributed  to  them.  In  accessing  data  stored  in  memory  connected  to 
another  processor,  the  programmer  must  explicitly  pass  messages  through  the  interconnecting 
network.  This  is  typically  done  using  message-passing  libraries  such  as  Message  Passing 
Interface  (MPI)  (Message  Passing  Interface  Forum,  1995;  Pacheco,  1996)  and  Parallel  Virtual 
Machine  (PVM)  (Geist  et  al.,  1994). 

Both  types  of  parallel  machines  have  advantages  over  the  other.  In  terms  of  the  shared- 
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Fig.  4.5  -  A  simplified  model  of  the  distributed-memory  computer  architecture,  where  each 
of  the  n  processors  has  direct  access  only  to  its  own  memory  and  access  to  memory 
located  on  another  processor  requires  passing  information  through  the  intercon¬ 
necting  network. 

memory  computer,  these  advantages  are 

•  Incremental  Parallelization.  One  has  the  ability  to  easily  parallelize  small  parts  of  an 
application  at  a  time,  while  leaving  the  remaining  parts  unmodified.  This  is  important 
when  one  or  more  small  parts  of  the  code  take  large  fractions  of  the  computational 
time.  In  this  case,  a  significant  application  speedup  can  be  obtained  at  a  small  cost  in 
terms  of  coding  time. 

•  Ease  of  Implementation.  Shared-memory  parallelization  is  generally  easier  to  code 
because  less  of  the  details  need  be  addressed  by  the  programmer. 

•  Small  Coding  Overhead.  Very  small  increases  in  code  volume  are  required  over  the 
serial  version.  Typical  values  are  2-25%  code  volume  increase  (Chandra  et  al.,  2001). 
Furthermore,  in  the  case  of  OpenMP,  parallel  codes  will  compile  and  run  on  serial 
machines  with  no  code  changes. 

The  drawbacks  to  shared-memory  machines  are 

•  Scalability.  Typically,  shared-memory  machines  are  not  scalable  to  massive  numbers  of 
processor  in  the  way  that  distributed-memory  systems  are. 

•  Specialized  Compiler.  A  special  compiler  and  set  of  run-time  libraries  are  required. 

•  Expense.  Though  shared-memory  computers  have  become  much  more  affordable,  es¬ 
pecially  in  the  last  decade,  they  are  often  more  expensive  than  a  similar  (in  terms  of 
processors  and  memory)  distributed-memory  system. 

From  the  above  discussion  it  is  apparent  that  the  algorithm  developed  in  this  work 
requires  a  shared-memory  computer.  This  is  because,  as  mentioned  in  the  previous  section, 
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all  flowfield  information  from  all  threads  is  required  in  order  to  calculate  the  convective  terms 
needed  to  advance  a  time  step.  Using  a  distributed  memory  system,  all  of  this  data  would 
have  to  pass  through  the  interconnecting  network,  which  is  not  feasible  considering  the  speed 
of  current  network  technology.  On  a  shared-memory  computer,  however,  all  the  threads  can 
access  the  same  memory  locations  and  the  parallelization  approach  becomes  tractable. 

An  additional  advantage  to  shared-memory  machines  is  that  the  scheduling  of  work 
chunks  to  processors  need  not  be  fixed.  Take  for  example  a  problem  in  which  one  has  six 
work  chunks  to  be  solved  on  three  processors.  For  static  load  scheduling,  each  work  chunk 
is  fixed  to  a  specific  processor,  as  shown  in  Figure  4.6.  This  type  of  scheduling  is  required 
with  distributed-memory  systems.  Using  a  shared-memory  machine,  one  has  the  option 
of  implementing  dynamic  load  scheduling.  This  approach  allows  for  better  load  balancing 
because  threads  that  finish  early  “ask”  for  more  work.  In  terms  of  the  example  problem,  this 
concept  is  illustrated  in  Figure  4.7.  At  the  beginning  of  the  parallel  region,  Figure  4.7(a),  all 
work  chunks  are  pooled  together.  Each  thread  is  then  given  one  (or  more,  if  desired)  work 
chunk  to  solve,  Figure  4.7(b).  In  this  example,  work  chunk  WO  requires  much  more  time  to 
solve  than  W1  and  W2.  Consequently,  processors  PI  and  P2  finish  their  work  first  and  are 
given  another  work  chunk,  Figure  4.7(c).  Processor  P2  is  the  first  processor  to  become  free, 
and  is  given  the  last  work  chunk  to  complete,  Figure  4.7(d).  This  dynamic  load  scheduling 
is  applied  to  the  parallelization  of  the  Fourier  coefficient  evaluation  section  of  the  code.  The 
other  parallel  regions  are  well  balanced  and  hence  a  static  scheduling  is  used. 


Fig.  4.6  -  Static  load  scheduling:  assignment  of  work  chunks  to  processors  is  fixed  and  de¬ 
termined  prior  to  execution. 


OpenMP  uses  a  fork-join  model  for  parallelization.  As  shown  in  Figure  4.8,  the  program 
initially  executes  as  a  single  thread  (called  the  master  thread) ,  which  spawns  a  team  of  threads 
when  a  parallel  region  is  encountered.  At  the  end  of  a  parallel  region,  all  additional  threads 
are  terminated  and  the  master  thread  continues  execution.  It  is  because  of  this  approach  that 
parallelism  can  be  added  incrementally.  Two  basic  flavors  of  parallel  regions  are  available: 
segmentation  of  the  code  into  different  sections  to  be  run  as  different  threads,  and  execution 
of  the  iterations  of  a  do-loop  in  parallel. 

Initiation  and  specification  of  parallel  regions,  as  well  as  other  OpenMP-related  tasks,  is 
accomplished  by  introducing  directives  into  the  serial  source  code.  In  free- form  Fortran90/95 
source  code,  a  line  that  begins  with  the  sentinel 
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Fig.  4.7  -  Dynamic  load  scheduling:  assignment  of  work  chunks  to  processors  occurs  dynam¬ 
ically  during  execution  on  a  first-completed-first-served  basis. 


! $omp  . . . 

is  treated  as  an  OpenMP  directive  by  an  OpenMP  compiler.  Because  the  directives  appear 
as  comment  lines  to  non-OpenMP  compilers,  correctly  written  code  can  be  compiled  and  run 
on  a  serial  machine  with  no  code  changes. 


Fig.  4.8  -  Fork-join  parallelism  model:  the  master  thread  spawns  a  team  of  threads  inside  a 
parallel  region. 

The  parallel  regions  in  the  solver  developed  in  this  work  are  all  of  the  do-loop  flavor.  In 
order  to  successfully  parallelize  a  do-loop,  the  loop  must  have  no  dependencies.  That  is,  the 
result  of  any  operation  in  a  loop  iteration  cannot  depend  on  the  result  of  any  other  iteration. 
This  means  that  the  iterations  can  be  executed  in  any  order  and/or  by  any  processor.  The 
Fortran90  syntax  for  parallelizing  a  do-loop  is 
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!$omp  parallel  do 
do  ... 


end  do 

!$omp  end  parallel  do 


The  opening  directive  must  appear  immediately  before  the  start  of  the  do- loop,  and  the 
closing  directive  must  appear  immediately  after  the  associated  end  do  statement.  Additional 
directives  are  typically  included  in  order  to  specify  those  variables  that  are  private  to  each 
thread  and  those  that  are  shared  among  all  threads,  to  control  how  the  work  is  divided  among 
the  processors,  to  initialize  private  variables,  etc.  For  example,  consider  the  following  sub¬ 
routine  to  perform  a  single  Jacobi  iteration  in  solving  the  Laplace  equation  on  a  rectangular 
domain  with  Dirichlet  boundary  conditions  using  finite-differences.  It  is  assumed  that  the 
boundary  condition  values  are  already  been  assigned  in  the  a  array  and  that  the  spacing 
between  nodes  is  constant  and  equal  in  both  directions.  The  serial  routine  is 


subroutine  j acobi_it (a , a_old , nx , ny) 
integer  ::  nx,ny,i,j 
real,  dimension (nx,ny)  ::  a,a_old 
do  i=2,nx-l 
do  j=2 ,ny-l 

a(i, j)=0.25*(a_old(i+l, j)+a_old(i-l, j)+a_old(i, j+l)+a_old(i, j-1) 
enddo 
enddo 
return 
end 


Parallelization  of  this  routine  using  the  parallel  do  directive  results  in 


subroutine  jacobi_it(a,a_old,nx,ny) 

integer  ::  nx,ny,i,j 

real,  dimension (nx,ny)  ::  a,a_old 

!$omp  parallel  do  private(i.j)  shared(a, a_old,nx,ny) 
do  i=2,nx-l 
do  j=2,ny-l 

a(i, j) =0.25* (a_old(i+l, j) +a_old(i-l , j) +a_old(i, j+1) +a_old(i, j-1) 
enddo 
enddo 
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! $omp  end  parallel  do 

return 

end 

The  private  clause  specifies  those  variables  that  should  be  kept  private  to  each  thread. 
In  this  case,  each  thread  keeps  its  own  copy  of  the  counter  variables  i  and  j .  The  shared 
clause,  on  the  other  hand,  specifies  those  variables  that  are  shared  among  all  threads. 

The  explanation  of  OpenMP  given  above  is,  of  course,  simplified  and  incomplete.  How¬ 
ever,  it  is  sufficient  for  understanding  of  the  parallel  implementation  of  the  algorithm  de¬ 
veloped  in  this  work.  For  further  information  on  programming  with  OpenMP,  the  reader 
is  referred  to  the  book  by  Chandra  et  al.  (Chandra  et  al.,  2001)  or  the  official  OpenMP 
specifications  (ope,  2000,  1998). 

4.6.3  Parallelization  Performance 

The  performance  of  the  parallelization  procedure  can  be  ascertained  via  a  test  case.  The 
algorithm  described  in  this  chapter  is  used  to  solve  flow  past  a  circular  cylinder  at  Re  =  195. 
The  details  of  this  particular  problem  will  be  discussed  in  detail  in  the  following  chapter. 
The  in-plane  mesh  consists  of  approximately  17,000  nodes,  and  32  Fourier  modes  are  used 
in  transverse  direction.  The  code  is  run  on  an  SGI  Origin2000  computer  with  8  195  MHz 
R10000  processors  and  approximately  2.5Gb  of  RAM.  To  ascertain  the  parallel  performance, 
the  simulation  is  run  for  100  time  steps  using  1,  2,. . .  8  processors.  Figure  4.9  shows  the  solver 
speedup  (the  computation  time  divided  by  the  serial  computation  time)  as  a  function  of  the 
number  of  processors  used.  The  major  reason  the  actual  speedup  (SFE-3D)  falls  away  from 
the  ideal  is  the  previously-mentioned  problem  of  load  imbalancing.  Though  the  dynamic  load 
scheduling  helps  remedy  this,  the  work  chunks  are  so  large  and  varying  that  it  is  common  for 
the  solver  to  be  forced  to  wait  while  the  last  work  chunk  is  completed.  It  should  be  noted 
that  as  more  Fourier  modes  are  used,  the  parallelization  becomes  more  efficient.  Of  course,  it 
is  now  apparent  that  the  parallelization  is  not  scalable  to  massive  numbers  of  processors.  In 
fact,  because  there  are  only  N/2  +  1  work  chunks,  typically  using  more  than  N/ 4  processors 
is  not  cost  effective  and  the  use  of  more  than  N/2  -)- 1  processors  is  virtually  useless. 


4.7  SFE3D  Summary 

This  section  provides  a  detailed  summary  of  the  implementation  of  the  FE/spectral 
algorithm  described  previously  in  this  chapter.  The  resulting  solver  is  hereby  referred  to  as 
SFE3D. 

SFE3D  seeks  to  find  an  approximate  solution  to  the  3D  unsteady  NS  equations  in  terms  of 
the  primitive  variables  ( u,v,w,p ).  When  the  governing  equations  (4.3)-(4.5)  are  discretized 
using  the  FE/spectral  method,  a  system  of  linear  equations  is  formed  for  each  Fourier  mode. 
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Fig.  4.9  -  Application  speedup  as  a  function  of  the  number  of  processors  used. 


When  solved,  the  result  is  an  approximation  for  the  Fourier  coefficients  of  the  primitive 
variables  (it,  v,  w,p).  Because  SFE3D  uses  a  nodal  ordering  of  the  unknowns,  when  this  linear 
system  is  written  in  matrix  form,  the  matrix  equations  to  be  solved  can  be  characterized  by 
an  8  x  8  system  associated  with  each  node  at  each  Fourier  mode  (4.33). 

All  the  unknowns  appearing  in  (4.33)  at  each  Fourier  mode  are  obtained  by  looping 
through  the  nodes  of  each  2D  element  and  summing  the  contributions.  Each  of  the  coefficient 
contributions  are  obtained  from  analytical  expressions  derived  from  the  combined  FE/spectral 
method.  These  expressions  are  given  in  (4.58)-(4.70)  and  (4.92)-(4.98). 

Figure  4.10  presents  a  flow  chart  giving  the  basic  structure  of  the  SFE3D  solver.  Each 
of  the  blocks  in  this  chart  will  now  be  discussed  in  some  detail. 

4. 7. 0.0.1  Boundary  and  initial  conditions. 

Boundary  condition  information  read  from  the  input  grid  file  is  processed,  and  Dirichlet 
values  are  stored  at  all  appropriate  nodes.  The  initial  conditions — the  values  of  u,w,p  for 
time  step  levels  n  and  n  —  1— are  set  to  the  values  stored  in  the  input  grid  file  if  a  restart 
is  requested.  Otherwise,  the  initial  conditions  are  set  to  zero.  The  convective  terms  at  time 
steps  n  and  n  - 1  are  also  computed  at  this  time  (see  the  paragraph  “Evaluation  of  convective 
terms”  for  more  details). 
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4. 7.0.0. 2  Create  and  store  [A]k  matrices. 

The  system  matrix  for  each  Fourier  mode  [A]k  is  computed  via  the  FE/spectral  method 
described  in  this  chapter.  These  matrices  are  constant  in  time  and  hence  need  only  be 
computed  once  at  the  beginning  of  a  run  and  then  stored  for  later  use.  These  matrices  are 
stored  in  the  sparse  matrix  format  described  in  Appendix  A  in  the  single  shared  array  a(:,k). 

4. 7.0. 0.3  Create  and  store  preconditioning  matrices. 

Because  the  system  matrices  are  constant,  so  too  are  the  preconditioning  matrices. 
Hence,  these  matrices  need  only  be  computed  once  at  the  beginning  of  a  run  and  then 
stored  for  later  use.  In  addition,  because  the  system  matrices  associated  with  each  Fourier 
mode  are  very  similar,  preconditioning  matrices  need  not  be  computed  for  every  mode.  In¬ 
stead,  preconditioning  matrices  computed  for  one  mode  can  be  applied  to  the  neighboring 
modes.  This  approach  saves  memory  and  does  so  at  very  little  cost  in  terms  of  computational 
speed.  These  preconditioning  matrices  are  stored  in  the  sparse  matrix  format  described  in 
Appendix  A  in  the  shared  arrays  pa(:  ,k),  pia(:  ,k),  and  pja(:  ,k). 

4. 7.0.0. 4  Create  system  matrix  structure. 

The  structure  of  the  sparse  system  matrices  (the  location  of  the  non-zero  entries)  is 
determined  from  the  mesh  connectivity  information  read  from  the  modified  DPlot  input  file. 
The  structures  of  the  system  matrices  for  each  Fourier  mode  are  identical.  Consequently,  the 
structure  is  only  computed  once  and  stored  in  the  shared  arrays  ia( : )  and  ja(: )  described 
in  Appendix  A. 

4. 7.0. 0.5  Evaluation  of  convective  terms. 

In  this  parallelized  block,  the  convective  terms  are  calculated  for  each  node  on  each  of 
the  N  equi-spaced  2D  planes.  A  parallel  do-loop  is  performed  through  the  2D  planes.  For 
each  plane,  the  following  steps  are  completed.  1)  The  d/dz  and  d2/dz2  values  are  calculated 
for  the  DOFs  at  each  node  using  finite-differences  and  are  stored  in  temporary  arrays.  2)  The 
convective  terms  (including  the  stabilization  terms  described  in  Section  4.5)  are  calculated 
using  finite-elements,  treating  the  d/dz  and  d2/dz2  terms  as  known  DOFs  at  each  node. 

4. 7. 0.0. 6  FFT  (boundary  and  initial  conditions). 

The  boundary  and  initial  conditions,  which  are  given  in  physical  space,  are  needed  in 
Fourier  space  for  use  in  the  evaluation  of  the  Fourier  coefficients.  This  transformation  is 
accomplished  via  a  node-by-node  FFT  in  the  transverse  direction. 
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4. 7. 0.0. 7  FFT  (convective  terms). 

The  convective  terms,  which  are  calculated  in  physical  space,  must  be  transformed  into 
Fourier  space  for  use  in  the  evaluation  of  the  new  Fourier  coefficients.  This  transformation 
is  accomplished  via  a  node-by-node  FFT  in  the  transverse  direction. 

4. 7. 0.0. 8  FFT  (flowfield  values). 

The  flowfield  values  at  the  new  time  step  must  be  transformed  into  physical  space  for 
output  and/or  evaluation  of  the  convective  terms.  This  transformation  is  accomplished  via 
a  node-by-node  IFFT  in  the  transverse  direction. 

4. 7.0. 0.9  Fourier  coefficient  evaluation. 

It  is  inside  this  parallelized  block  that  most  of  the  computation  time  is  spent.  The 
decoupling  of  the  Fourier  modes  allows  a  parallel  do-loop  to  be  executed  through  the  Fourier 
modes.  For  each  mode,  the  following  steps  are  completed.  1)  The  RHS  vector  b( :  ,k)  is  filled 
as  controlled  by  (4.33)  and  the  boundary  conditions.  2)  The  matrix  equation  is  solved  using 
the  SPARSKIT  package  (see  Section  2.4).  The  zero  vector  is  used  as  the  initial  guess.  This 
is  the  natural  choice  because  the  solution  {<j>}  is  the  change  in  flow  variables  between  time 
steps  and  these  values  are  expected  to  be  small.  3)  The  solution  for  each  mode  is  stored  in 
the  shared  array  phi ( :  ,k) . 

4.7.0.0.10  Initialization. 

The  user-defined  parameters  for  SFE3D  are  given  via  a  Fortran  NAMELIST  file.  These 
parameters  contain  such  information  as  time  step  size,  domain  transverse  dimension,  number 
of  time  steps,  number  of  threads,  sparse  matrix  solver  parameters,  and  Dirichlet  boundary 
condition  values.  Additionally,  input  and  output  filenames  are  obtained  and  the  correspond¬ 
ing  files  are  opened. 

4.7.0.0.11  Read  grid/restart  file. 

The  input  grid  file  is  read.  The  format  of  this  file  is  a  modified  form  of  the  DPlot 
format,  which  was  developed  at  the  von  Karman  Institute  and  is  described  in  Appendix  C. 
This  file  contains  the  number  of  2D  planes  (Fourier  modes),  the  2D  nodal  coordinates  and 
connectivity,  boundary  condition  types,  and  flowfield  values  at  each  node  and  plane  for  time 
steps  n  and  n  —  1.  After  this  file  has  been  read,  one-time  calculations  such  as  element  areas 
and  normals  are  calculated.  Also,  memory  is  allocated  for  all  arrays  used  in  the  solver. 
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4.7.0.0.12  Update  time  step. 


Once  all  of  the  matrix  equations  have  been  solved  for  the  current  time  step,  the  flow 
variables  are  updated  for  the  next  time  step.  That  is, 


(Pi)r1  =  (fc)z 


(C  =  (*)!;_  war'-ws, 

(*)I = (C  ’  (”<)" = w1 ,  = (ft)r1 

( s< )  I+1 = (C + { (5i)  *  ■  (*i)‘+1 = (^r1 + 1  (“4  >  (ft)r1 = (ft)r' + * (ft)* 


4.7.0.0.13  Write  data  files. 


The  Fortran  NAMELIST  file  contains  an  output  frequency  parameter  stating  the  number 
of  time  steps  between  writing  of  the  output  files.  If  the  current  time  step  requests  an  output, 
a  restart  file  is  written  in  the  modified  DPlot  format  defined  in  Appendix  C  as  well  as  a 
post-processing  file  in  TecPlot  and/or  FieldView  unstructured  grid  format. 
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Fig.  4.10  -  Flow  chart  showing  the  basic  stri^ure  of  the  SFE3D  Navier-Stokes  solver.  Sec¬ 
tions  of  the  code  that  are  parallelized  are  denoted  by  a  shaded  box. 


5.  3D  LAMINAR  FLOW  TEST  CASE 


This  chapter  presents  a  validation  test  case  for  the  SFE3D  solver.  Specifically,  the  case  of 
flow  past  a  circular  cylinder  in  the  3D  transition  regime  is  chosen  because  the  flow  is  laminar 
and  yet  there  exist  3D  features  with  distinct  characteristics  that  vary  with  Reynolds  number. 


5.1  Circular  Cylinder  Problem  Description 

A  short  discussion  on  circular  cylinder  flow  has  already  been  presented  in  Section  3.3.  As 
mentioned  in  that  section,  the  flow  over  a  circular  cylinder  is  interesting  because  it  contains 
many  distinct  flow  characteristics,  including  boundary  layers,  separation,  shear  layers,  and  a 
wake  region.  For  this  reason,  and  also  because  a  vast  amount  of  experimental  and  numerical 
data  exists  for  comparison,  circular  cylinder  flow  is  an  excellent  test  case  for  validating  NS 
solvers. 

The  behavior  of  the  flow  is  highly  dependent  upon  the  Reynolds  number, 

(5.X) 

where  Ux  is  the  freestream  velocity,  D  the  cylinder  diameter,  and  u  the  kinematic  viscosity. 
For  Re  <  180  the  wake  behind  the  cylinder  is  2D,  while  at  Re  180  the  wake  is  unstable  to 
transverse  perturbations  and  develops  3D  structures.  Simulation  results  for  the  2D  regime 
were  presented  in  Section  3.3.  In  this  chapter,  Reynolds  numbers  of  195  and  300  are  presented. 
These  Reynolds  numbers  are  both  in  the  3D  regime,  but  well  below  the  onset  of  turbulence, 
which  occurs  at  Re  «  1200. 

According  to  Williamson  (Williamson,  1996b),  at  Re  «  180  -  200,  the  wake  behind  the 
cylinder  develops  streamwise  structures  of  rather  large  dimension.  These  structures  are  known 
as  mode-A  instabilities  and  typically  have  wavelengths  on  the  order  of  3-4 D.  The  Re  =  195 
case  chosen  for  this  study  lies  in  this  regime.  As  the  Reynolds  number  is  increased  to  around 
260,  3D  structures  with  wavelengths  on  the  order  of  ID,  known  as  mode-B  instabilities, 
become  dominant.  The  Re  =  300  case  chosen  for  this  study  is  in  this  regime.  Figure  5.1, 
taken  from  the  experiments  of  Williamson  (Williamson,  1996a),  illustrates  the  mode-A  and 
mode-B  instabilities  via  dye  streaks  in  the  wake. 

The  qualitative  correctness  of  the  SFE3D  solver  can  be  verified  by  its  ability  to  produce 
mode-A  and  mode-B  instabilities  at  the  correct  Reynolds  numbers.  In  addition,  some  quan¬ 
titative  checks  are  performed  for  the  Re  =  300  case.  Specifically,  the  shedding  frequency  in 
terms  of  the  Strouhal  number, 

st  =  z r<  <5-2) 

Voo 

is  compared  with  experiments.  Also,  the  mean  velocity  profiles  in  the  wake  are  compared 
with  experimental  and  numerical  results.  The  mean  values  of  interest  are  the  streamwise 


Fig.  5.1  -  3D  circular  cylinder  flow:  mode-A  and  mode-B  instabilities  in  the  cylinder  wake, 
as  visualized  by  Williamson  (Williamson,  1996b).  (a)  shows  mode-A  instabilities 
at  Re  =  200  with  spanwise  wavelength  X/D  =  4.01.  (b)  shows  mode-B  instabilities 
at  Re  —  300  with  spanwise  wavelength  X/D  «  1. 


velocity  u,  crossflow  velocity  v,  streamwise  velocity  fluctuations 


crossflow  velocity  fluctuations 
and  the  Reynolds  shear  stress 


u'2  =  uu  —  u 2, 

(5.3) 

v' 2  =  vv  —  U2 , 

(5.4) 

u'v'  =  uv  —  U  V . 

(5.5) 

5.2  Computational  Mesh 

The  computational  domain  used  for  both  the  Re  —  195  and  Re  =  300  cases  is  displayed  in 
Figure  5.2.  A  Dirichlet  velocity  inlet  condition  is  applied  5 D  upstream  of  the  cylinder  center, 
while  a  Neumann  outlet  condition  exists  20 D  downstream.  As  reported  by  Kravchenko  and 
Moin  (Kravchenko  and  Moin,  1998),  the  shedding  frequency  does  not  become  independent  of 
the  lateral  domain  size  until  approximately  60D.  In  order  to  keep  the  computations  tractable, 
however,  the  lateral  domain  extends  only  7 D  from  the  cylinder  in  thus  study.  At  these  extents 
a  slip  wall  condition  is  applied.  Due  to  the  proximity  of  the  walls,  a  slightly  higher  shedding 
frequency  is  expected  compared  to  free-stream  results.  Mittal  and  Balachandar  (Mitall  and 
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Balachandar,  1995)  studied  the  sensitivity  of  numerical  simulations  to  transverse  domain  size 
at  Re  —  300.  They  showed  that  if  the  domain  is  small,  variations  in  the  mean  flow  parameters 
(lift,  drag,  shedding  frequency)  appear.  Considering  this  study,  as  well  as  that  of  Kravchenko 
and  Moin  (Kravchenko  and  Moin,  1998),  a  spanwise  dimension  of  2 ttD  was  deemed  sufficient. 
Of  course,  as  mentioned  in  Chapter  4,  periodic  boundary  conditions  exist  on  these  spanwise 
boundaries. 


Fig.  5.2  -  3D  circular  cylinder  flow:  overview  of  the  computational  domain  and  mesh  con¬ 
taining  33,000  in-plane  triangle  elements  (17,000  in-plane  nodes)  and  32  Fourier 
modes. 

In  the  2D  plane,  the  mesh  consists  of  33,000  triangle  elements  with  approximately  17,000 
nodes.  As  seen  in  Figure  5.2,  nodes  are  clustered  in  the  wake  region,  with  characteristic 
element  side  lengths  of  approximately  0.15D.  Outside  the  wake  region  the  mesh  is  more 
coarse,  with  the  characteristic  element  side  length  being  0.5JD  at  the  upstream  and  lateral 
domain  extents.  Elements  are  also  clustered  toward  the  cylinder  in  order  to  better  capture 
the  initial  development  of  the  shed  vorticity.  Boundary  layer  wedge-type  elements  are  used 
very  near  the  cylinder  walls,  with  the  perpendicular  dimension  of  the  first  element  off  the 
cylinder  being  0.005D.  The  smallest  spanwise  structures  are  expected  at  Re  =  300  and  have 
a  wavelength  on  the  order  of  ID.  Because  a  spectral  method  is  used  in  this  direction,  16 
modes  are  sufficient  to  capture  these  scales.  However,  in  order  to  verify  that  smaller  scales 
are  not  dominant,  32  modes  are  used. . .  allowing  wavelengths  as  small  as  0.4D  to  be  captured. 
Overall,  the  mesh  contains  approximately  544,000  nodes. 

The  mesh  described  above  is  certainly  more  refined  than  necessary.  Consequently,  to 
check  for  grid  convergence  a  coarse  mesh  simulation  was  also  performed  for  the  Re  =  300  case. 
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In  this  mesh,  the  resolution  is  essentially  halved  in  each  direction,  resulting  approximately 
7,400  in-plane  elements  (3,800  in-plane  nodes)  and  16  Fourier  modes. 


5.3  Results 

For  each  case  (Re  =  195,  Re  =  300,  Re  =  300  coarse  mesh),  the  simulation  was  initialized 
using  potential  flow.  To  decrease  the  computation  time,  the  solution  was  allowed  to  develop 
as  a  2D  flow  until  the  Karman  vortex  street  was  fully  developed.  In  addition,  to  hasten  the 
asymmetric  shedding,  a  time-dependent  rotation  was  applied  to  the  cylinder  at  the  beginning 
of  the  simulation  for  a  short  time;  first  in  one  direction,  then  in  the  opposite.  After  the  2D 
Karman  vortex  street  had  developed,  a  small  perturbation  was  applied  at  the  inlet  boundary 
for  a  short  time  to  initiate  the  3D  features  of  the  flow. 

At  Re  =  195  and  Re  =  300,  the  cylinder  wake  is  unstable  to  the  spanwise  perturbation, 
and  the  perturbation  should  grow  to  develop  the  mode-A  and  mode-B  streamwise  vortex 
structures.  At  lower  Reynolds  numbers,  however,  the  wake  is  stable  to  spanwise  perturba¬ 
tions.  As  a  first  check  on  the  SFE3D  solver,  a  Re  =  100  case  was  run.  As  expected,  the 
perturbation  simply  diminished  as  it  was  propagated  downstream  and  eventually  washed 
out  of  the  domain.  Figure  5.3  shows  instantaneous  surfaces  of  constant  transverse  vorticity 
tUoo/D  =  40  after  termination  of  the  inlet  perturbation.  The  flow  no  longer  has  any  3D 
features — only  the  zeroth  Fourier  mode  is  non-zero. 


Fig.  5.3  -  3D  circular  cylinder  flow:  surfaces  of  constant  transverse  vorticity  showing  the  2D 
Karman  vortex  street  at  Re  =  100  after  the  transverse  perturbation  imposed  at 
the  inlet  has  washed  downstream. 

For  the  Re  =  195  case,  the  perturbation  does  indeed  persist  and  grow  in  the  wake 
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region.  Eventually,  a  statistically  steady  3D  Karman  vortex  street  develops.  Figure  5.4  shows 
the  qualitatively  correct  mode-A  instabilities.  The  yellow  and  cyan  surfaces  mark  constant 
positive  and  negative  streamwise  vorticity,  and  highlight  the  alternate  vortex  shedding.  The 
blue  and  red  surfaces  mark  constant  positive  and  negative  transverse  vorticity  and  show 
the  mode-A  structures  with  characteristic  wavelength  ~  3D.  These  results  are  in  excellent 
agreement  with  the  computations  of  Thompson  et  al  (Thompson  et  al.,  1994)  shown  in 
Figure  5.5. 


Fig.  5.4  -  3D  circular  cylinder  flow:  mode-A  instabilities  in  the  wake  at  Re  =  195.  The 
red  and  blue  surfaces  mark  a  positive  and  negative  value  of  streamwise  vorticity, 
and  the  yellow  and  cyan  surfaces  mark  a  positive  and  negative  value  of  transverse 
vorticity. 

Figure  5.6  shows  the  qualitatively  correct  mode-B  instabilities  in  the  cylinder  wake  as 
predicted  by  SFE3D.  The  yellow  and  cyan  surfaces  again  mark  a  constant  positive  and 
negative  streamwise  vorticity,  and  highlight  the  alternate  vortex  shedding.  The  blue  and 
red  surfaces  mark  constant  positive  and  negative  values  of  spanwise  vorticity  and  show  the 
mode-B  structures  with  characteristic  wavelength  ~  ID.  Again,  these  results  are  in  excellent 
qualitative  agreement  with  the  computations  of  Thompson  et  al  (Thompson  et  al.,  1994), 
shown  in  Figure  5.7. 

As  a  quantitative  check  on  the  results,  the  flow  statistics  are  plotted  and  compared  with 
the  published  numerical  results  of  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998),  who 
used  a  B-spline  method  with  zonal  grids.  These  results,  in  turn,  agree  well  with  the  spectral 
results  of  Mittal  and  Balachandar  (Mittal  and  Balachandar,  1997).  Figures  5.8-5.12  show 
velocity  statistics  at  five  streamwise  locations  in  the  wake.  The  SFE3D  statistics  are  averaged 
in  the  transverse  direction  and  over  approximately  eight  shedding  cycles.  From  these  figures 
it  is  seen  that  the  fine-grid  SFE3D  results  are  in  good  agreement  with  the  B-spline  results 
for  both  the  mean  velocities  and  the  time-averaged  fluctuations.  The  slight  discrepancy 
between  the  SFE3D  and  B-spline  results  has  two  likely  causes.  The  first  is  that  this  author 
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Fig.  5.5  -  3D  circular  cylinder  flow:  mode-A  instabilities  in  the  wake,  as  calculated  by 
Thompson  et  al  (Thompson  et  al.,  1994).  The  light  and  dark  surfaces  mark  a 
particular  value  of  positive  and  negative  streamwise  vorticity,  and  the  other  sur¬ 
face  marks  a  value  of  spanwise  vorticity. 

was  unable  to  acquire  tabular  data  from  either  Kravchenko  or  Moin.  Instead,  the  B-spline 
profiles  were  digitized  from  published  plots,  which  introduces  error.  The  second  probable 
cause  is  the  proximity  of  the  lateral  wall  boundaries.  In  the  B-spline  simulations,  a  far-field 
condition  was  imposed  30 D  from  the  cylinder,  whereas  the  SFE3D  simulations  utilize  a  slip- 
wall  condition  located  7 D  from  the  cylinder.  The  nearness  of  the  walls  causes  an  additional 
acceleration  of  the  flow  as  it  passes  the  cylinder,  resulting  in  increased  recirculation  and 
higher  velocity  fluctuations.  Coarse-grid  SFE3D  results  are  also  shown  in  Figures  5.8-5.12. 
These  results  are  sufficiently  close  to  the  fine-grid,  and  convergence  to  the  published  values 
is  achieved  with  grid  refinement. 

The  time-averaged  drag  coefficient  for  the  cylinder  at  Re  =  300  is  computed  using  (3.4), 
averaged  over  the  span.  The  shedding  frequency  is  determined  by  computing  the  power  spec¬ 
trum  of  the  time-accurate  lift  coefficient,  and  the  Strouhal  number  subsequently  evaluated 
from  equation  (5.2).  The  results  are  shown  in  Table  5.1  along  with  some  experimental  val¬ 
ues  and  the  numerical  results  of  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998).  The 
drag  coefficient  predicted  by  SFE3D  is  slightly  higher  that  that  measured  by  Weiselsberger 
(Weiselsberger,  1922),  but  is  sufficiently  accurate  and  changes  very  little  between  the  coarse 
and  fine  meshes.  The  Strouhal  number  did  not  change  with  grid  refinement,  and  is  slightly 
higher  than  the  value  measured  by  Williamson  (Williamson,  1996b).  This  is  to  be  expected, 
however,  due  to  the  proximity  of  the  lateral  walls  in  the  SFE3D  simulations. 


Fig.  5.6  -  3D  circular  cylinder  flow:  mode-B  instabilities  in  the  wake  at  Re  =  300.  The 
red  and  blue  surfaces  mark  a  positive  and  negative  value  of  streamwise  vorticity, 
and  the  yellow  and  cyan  surfaces  mark  a  positive  and  negative  value  of  transverse 
vorticity. 


Table  5.1  -  3D  circular  cylinder  flow:  comparison  of  CD  and  St  for  SFE3D,  published  experi¬ 
ments  (Weiselsberger,  1922;  Williamson,  1996b),  and  published  numerical  results 


cD 

St 

SFE3D  (coarse) 

1.24 

HU 

SFE3D  (fine) 

1.29 

n 

Williamson  exp. 

- 

0.203 

Weiselsberger  exp. 

1.22 

- 

B-Spline 

1.28 

0.203 

Fig.  5.7  -  3D  circular  cylinder  flow:  mode-B  instabilities  in  the  wake,  as  calculated  by 
Thompson  et  al  (Thompson  et  al.,  1994).  The  light  and  dark  surfaces  mark  a 
particular  value  of  positive  and  negative  streamwise  vorticity,  and  the  other  sur¬ 
face  marks  a  value  of  spanwise  vorticity. 


y/D 


Fig.  5.8  -  3D  circular  cylinder  flow:  mean  streamwise  velocity  at  different  streamwise  loca¬ 
tions  in  the  wake  at  Re  =  300.  - SFELES  fine  grid, - SFELES  coarse  grid, 

- B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998). 
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Fig.  5.9  -  3D  circular  cylinder  flow:  mean  lateral  velocity  at  different  streamwise  locations 

in  the  wake  at  Re  =  300.  - SFELES  fine  grid, - SFELES  coarse  grid,  — 

•-  B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998). 


y/D 


Fig.  5.10  -  3D  circular  cylinder  flow:  time-averaged  streamwise  velocity  fluctuations  at  dif¬ 
ferent  streamwise  locations  in  the  wake  at  Re  =  300.  - SFELES  fine  grid, 

- SFELES  coarse  grid, . B-spline  simulations  by  Kravchenko  and  Moin 

(Kravchenko  and  Moin,  1998). 
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Fig.  5.11  -  3D  circular  cylinder  flow:  time-averaged  lateral  velocity  fluctuations  at  different 

streamwise  locations  in  the  wake  at  Re  =  300.  - SFELES  fine  grid, . SFE- 

LES  coarse  grid, - B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko 

and  Moin,  1998). 


Fig.  5.12  -  3D  circular  cylinder  flow:  time-averaged  Reynolds  shear  stress  at  different  stream- 

wise  locations  in  the  wake  at  Re  =  300.  - SFELES  fine  grid, - SFELES 

coarse  grid, - B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko  and 

Moin,  1998). 
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6.  LARGE-EDDY  SIMULATION  SOLVER 


It  is  commonly  felt  that  the  pacing  factor  for  the  development  of  more  accurate  design 
and  analysis  tools  for  aerodynamic  and  industrial  applications  is  the  development  of  better 
models  of  turbulent  flow.  Since  the  first  LES  results  were  published  by  Deardorff  in  1970 
(Deardorff,  1970),  great  advances  have  been  made  both  in  LES  algorithms  and  available 
computer  hardware.  In  its  infancy,  LES  was  limited  to  simple  flows  and  to  the  very  few 
groups  that  could  afford  access  to  supercomputers.  It  was  developed  as  a  compliment  to 
laboratory  experiments  investigating  the  fundamental  physics  of  simple  turbulent  flows.  With 
the  advances  in  computer  technology,  the  possibility  of  using  LES  is  now  available  to  many, 
and  can  be  applied  to  more  complex  flows.  However,  LES  is  still  limited  mainly  to  rather 
simple  geometries  that  can  be  meshed  via  Cartesian  or  other  structured  meshes. 

In  this  chapter  the  extension  of  the  3D  solver  developed  in  Chapter  4  to  LES  is  discussed. 
The  LES  approach  to  handling  turbulence  is  presented  and  compared  with  the  RANS  and 
DNS  approaches.  The  governing  equations  are  then  derived  and  a  review  of  subgrid  scale 
models  is  given.  Finally,  the  implementation  procedure  in  terms  of  the  FE/spectral  method 
of  this  work  is  provided. 


6.1  LES  Approach  to  Turbulence 

Simulations  of  turbulent  flows  are  difficult  because  of  the  large  range  of  scales  involved: 
the  largest  scales  are  of  the  flow  domain,  while  the  smallest  are  on  the  order  of  the  Kolmogorov 
length  scale  (Kolmogorov,  1941,  1962).  The  DNS  approach  to  turbulence  involves  directly 
solving  the  NS  equations  with  no  modeling  of  the  turbulence— all  turbulent  fluctuations 
are  calculated  explicitly  as  shown  in  Figure  6.1(a).  Of  course,  because  turbulence  is  an 
unsteady  and  3D  phenomenon,  no  2D  or  steady-state  simplifications  can  be  made  in  solving 
the  NS  equations.  As  a  consequence  of  needing  to  capture  the  small  scale  (high  frequency) 
fluctuations,  extremely  fine  meshes  and  time  steps  are  required.  In  one  dimension,  the  number 
of  nodes  required  is  Nn  ~  L/t?,  where  L  is  the  dimension  of  the  computational  domain  and 
V  is  the  Kolmogorov  length  scale.  This  ratio  scales  as  Re3/4,  consequently  the  number  of 
nodes  needed  for  DNS  (which  is  3D)  scales  as  N3  ~  Re9/4.  Taking  into  account  the  number 
of  calculations  per  node  and  the  number  of  time  steps  required  to  advance  the  solution  for  a 
sufficient  time,  the  overall  cost  of  a  DNS  computation  is  0(Re3).  In  other  words,  doubling 
the  Reynolds  number  means  the  computational  effort  is  increased  by  at  least  a  factor  of 
eight!  In  addition,  highly  accurate  (high-order)  schemes  must  be  used  in  order  to  reduce 
dispersion  and  dissipation  errors.  These  schemes  (spectral  methods,  for  example)  do  not 
lend  themselves  well  to  complex  geometries  and/or  boundary  conditions.  Due  to  current 
algorithm  and  (especially)  hardware  limitations,  DNS  is  limited  to  simple  geometries  and 
very  low  Reynolds  numbers;  typically  below  those  of  engineering  interest. 
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Simulations  of  high  Reynolds  number  turbulent  flows  have  been  commonplace  for  the 
past  three  decades,  however,  as  a  result  of  utilizing  the  RANS  equations.  In  this  approach,  one 
follows  the  original  idea  of  Reynolds  (Reynolds,  1895)  in  assuming  the  fluid  is  in  a  randomly 
unsteady  turbulent  state  and,  via  time-averaging,  all  quantities  can  be  considered  as  having 
a  mean  and  a  fluctuating  part.  The  time  average  of  a  function  is  defined  as 

F  =  ~f  f  (t)  dr,  (6.1) 

where  T  is  a  time  interval  much  longer  than  the  largest  fluctuation  time  scale.  The  function 
f(t )  can  now  be  decomposed  into  its  mean  F  and  fluctuating  /(■£)  —  F  parts.  Applying  this 
averaging  to  the  NS  equations  results  in  the  RANS  equations,  which  are  solved  only  for  the 
mean  flow  quantities  while  accounting  for  the  effects  of  the  fluctuations  via  a  turbulence 
model.  In  this  way,  all  unsteadiness  is  averaged  out  as  being  part  of  the  turbulence,  shown 
in  Figure  6.1(b).  RANS  solutions  tend  to  suffer  from  accuracy  and  generality  problems,  both 
of  which  are  attributed  to  the  shortcomings  of  the  turbulence  models.  Models  are  typically 
‘tuned’  by  applying  them  to  simple  flows  for  which  theoretical  or  thoroughly  validated  exper¬ 
imental  results  exist.  When  the  models  are  applied  to  flows  that  differ  from  those  by  which 
they  are  tuned,  adjusting  of  the  model  is  required  in  order  to  obtain  acceptable  results.  These 
shortcomings  are  a  consequence  of  the  model  being  required  to  represent  such  a  wide  range 
of  scales.  As  stated  by  Piomelli  (Piomelli,  1994),  “While  the  small  scales  tend  to  depend 
only  on  viscosity,  and  may,  therefore,  be  somewhat  universal,  the  large  ones  are  affected  very 
strongly  by  the  boundary  conditions. . .  Thus,  it  does  not  seem  possible  to  model  the  effect  of 
the  large  scales  of  turbulence  in  the  same  way  in  flows  that  are  very  different.” 

The  LES  approach  lies  between  the  extremes  of  DNS,  in  which  all  fluctuations  are  re¬ 
solved,  and  the  RANS  approach,  in  which  only  mean  quantities  are  calculated  and  the  fluctu¬ 
ations  modeled.  The  basic  philosophy,  as  illustrated  in  Figure  6.1(c),  is  to  explicitly  compute 
only  the  large-scale  motions  that  are  directly  affected  by  the  boundary  conditions,  and  to 
model  the  effects  of  the  small-scale  motions.  The  driving  notion  behind  this  approach  is 
that  the  small-scale  motions  are  more  isotropic  and  dependent  primarily  on  viscosity  effects; 
modeling  of  these  terms  should  therefore  be  more  general.  In  addition,  the  majority  of  the 
computational  cost  of  DNS  is  spent  in  resolving  the  small-scale,  dissipative  motions,  while 
the  quantities  of  engineering  interest  can  be  calculated  mainly  from  the  large-scale  motions. 
Because  the  large  scales  of  turbulence  are  explicitly  calculated,  LES  is  inherently  unsteady, 
3D,  and  requires  fairly  fine  meshes  and  time  steps.  However,  it  can  be  used  at  much  higher 
Reynolds  numbers  than  DNS.  In  fact,  assuming  the  small  scales  ideally  obey  inertial  range 
dynamics,  and  in  the  absence  of  solid  boundaries,  the  cost  of  a  computation  is  independent 
of  Reynolds  number. 

The  separation  of  the  large  scales  from  the  small  is  accomplished  via  a  low-pass  filter 
operation.  A  filtered  quantity,  denoted  in  this  work  by  an  overbar,  is  defined  in  the  manner 
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Fig.  6.1  -  Comparison  of  the  different  approaches  to  handling  turbulence,  a)  Direct  Nu¬ 
merical  Simulation  (DNS),  b)  Reynolds  Averaged  Navier-Stokes  (RANS),  c)  Large 
Eddy  Simulation  (LES). 


introduced  by  Leonard  (Leonard,  1974): 


[  f(r,t)G(x  -  r)dr, 
J  n 


where  Q  is  the  computational  domain  and  G  is  the  filter  function.  The  filter  function  must 
satisfy  the  normalization  condition 


[  G(x  —  r)df  =  1. 
J  n 


It  also  should  provide  a  mean-preserving  filtering  operation  (for  a  constant  filter  width)  that 
commutes  with  differentiation. 

The  most  commonly  used  filter  functions  are  the  tophat  filter, 

if|z|<A/2 


G(x)  = 


0,  otherwise, 
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shown  in  Figure  6.2a;  the  Gaussian  filter, 


G(x)  = 


6 


7tA2 


e  a3", 


shown  in  Figure  6.2b;  and  the  Fourier  cutoff  filter, 

sin(7rx/A) 


G(x)  = 


'KX 


shown  in  Figure  6.2c.  In  the  above  expressions,  A  is  the  filter  width. 


(6.5) 


(6.6) 


Fig.  6.2  -  Commonly-used  filter  functions,  a)  tophat,  b)  Gaussian,  and  c)  sharp  Fourier 
cutoff. 

Figure  6.3  shows  how  each  of  these  filter  functions  performs  when  applied  to  a  general  ID 
function.  The  dotted  line  shows  the  original  function,  the  solid  line  is  the  filtered  function, 
and  the  diamond  symbols  show  the  spacing  of  the  sampling  points.  Frames  (a)-(c)  all  utilize 
the  same  filter  width,  while  frame  (d)  shows  the  effect  of  reducing  the  filter  width  by  a  factor 
of  eight  using  the  sharp  Fourier  cutoff  filter  function.  It  is  seen  that  each  filter  function 
successfully  filters  out  the  high-frequency  oscillations.  It  is  also  seen  that  the  sharp  Fourier 
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cutoff  tends  to  better  reproduce  the  smaller  resolved  scales.  However,  this  added  performance 
comes  at  the  cost  of  being  non-local  in  physical  space. 

Additional  insight  into  these  filter  functions  can  be  obtained  by  analyzing  their  properties 
in  Fourier  space.  Given  that  f(x)  has  a  Fourier  transform 

m  =  F[f{x)\,  (6.7) 

the  Fourier  transform  of  the  filtered  function  is 

7(k)  =  G(k)f(k),  (6.8) 


where  G(k )  is  termed  the  transfer  function  and  is  times  the  Fourier  transform  of  the  filter 
function 


G(k)  =  2nT[G{x)\. 

(6.9) 

The  tophat  transfer  function  is 

,  sin  (ifcA) 

(6.10) 

the  Gaussian  transfer  function  is 

G{k)  =  e-^, 

(6.11) 

and  the  sharp  Fourier  cutoff  transfer  function  is 

G(k)  =  f1’  lfW-fcc 

(6.12) 

[0,  otherwise, 

where  kc  is  the  cutoff  frequency 

k  -  — 

Kc~  A' 

(6.13) 

The  characteristics  of  the  filter  functions  in  Fourier  space  can  now  be  viewed  in  terms 
of  their  attenuation  factors,  G(k)2,  shown  in  Figure  6.4.  It  is  seen  that  the  tophat  filter  is 
not  effective  in  attenuating  high  wave  numbers.  In  addition,  the  tophat  and  Gaussian  filters 
not  only  attenuate  the  high  frequencies  (small  scales),  but  also  some  of  the  low  frequencies 
(large  scales)  as  well. 


6.2  Governing  Equations 

The  LES  governing  equations  are  derived  by  applying  the  filtering  operation  (6.2)  to  the 
NS  equations.  The  incompressible  NS  equations  in  the  absence  of  body  forces  and  written  in 
tensor  notation  are 


dui  d  .  .  dp  2 

_  +  —(u.uj)  =  -gj-  + 


(6.14) 
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(6.15) 


gS-o, 

OXi 

where  tq  is  the  ith  component  of  the  velocity  vector  and  p  is  the  kinematic  pressure  (pres¬ 
sure  divided  by  density).  The  particular  filter  function  G  used  in  arriving  at  the  governing 
equations  is  immaterial,  as  long  as  it  follows  the  rules  given  in  the  previous  section.  The  fil¬ 
ter  operation  (6.2)  commutes  with  both  temporal  and  spatial  derivatives  (see  Appendix  D). 
That  is, 

S.W  (6.16) 

dt  dt'  ^  ’ 


df_  =  df_ 

dxi  dxi ' 


(6.17) 

(6.18) 


Using  (6.17),  filtering  of  the  continuity  equation  (6.15)  is  straightforward: 


OX{ 


(6.19) 


The  momentum  equation  (6.14)  is  equally  straightforward  using  both  (6.16)  and  (6.17): 

dui  d  .  .  dp  ,  _9- 

Si  +  Tx +  vVUi 


dUi  d  . _ .  dp  _o _ 

~m+^UiUj)~~d^  +  I/V  Ui ‘ 


(6.20) 


The  uiuj  term  poses  a  problem  because  it  is  the  filtering  operation  applied  to  a  product,  and 
u^uj  7^  UiUj .  The  above  expression  is  typically  rewritten  as 


dui  d  dp  dri:j  2_ 

Si  +  aJJ(Ui“j)  =  -dil+d^ 


(6.21) 


where 


Tij  —  Uillj  UjUj .  (6.22) 

This  term,  which  accounts  for  the  effects  of  the  unresolved  scales,  is  called  the  subgrid-scale 
(SGS)  Reynolds  stress,  and  must  be  modeled  via  a  SGS  model. 

With  the  exception  of  the  SGS  Reynolds  stress  term,  the  LES  governing  equations  are 
identical  in  form  to  the  laminar  NS  equations.  Consequently  (assuming  that  a  suitable  model 
for  the  SGS  Reynolds  stress  exists)  nearly  identical  solution  algorithms  can  be  employed. 
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6.3  Subgrid-Scale  Models 

The  filtered  momentum  equations  are  closed  via  a  SGS  model  that  attempts  to  account 
for  the  effects  of  the  small  scales  on  the  resolved  scales.  The  primary  task  of  the  model  is  to 
correctly  account  for  the  energy  transfer  between  the  large  and  small  scales.  While  on  the 
average  energy  is  transferred  from  the  large  scales  to  the  subgrid  scales,  energy  flow  from  the 
small  to  the  large  scales  can  occur  intermittently.  This  phenomenon  is  called  backscatter  and 
is  currently  an  area  of  interest  for  SGS  model  development.  Most  models  in  use,  however, 
are  absolutely  dissipative — only  allowing  energy  to  flow  into  the  small  scales. 

Though  it  is  an  area  of  intense  research,  this  work  does  not  attempt  to  further  the 
technology  of  SGS  models.  Instead,  the  focus  is  on  the  development  and  implementation  of 
the  FE/spectral  discretization  algorithm.  Consequently,  the  common  Smagorinski  model  is 
implemented.  In  this  section  then,  the  majority  of  effort  is  spent  discussing  the  Smagorinski 
model.  Other  common  models  are  presented,  but  only  briefly.  The  interested  reader  is 
referred  to  the  references  cited  in  conjunction  with  each  model. 


6.3.1  Smagorinski  Model 

The  Smagorinski  model,  originally  proposed  in  1963  (Smagorinski,  1963),  is  certainly  the 
most  commonly-used  SGS  model.  A  Boussinesq  approximation  is  first  made,  in  which  the 
SGS  Reynolds  stress  is  assumed  proportional  to  the  resolved  strain: 


Tij  —  2utSij  + 


(6.23) 


where 


f&Uj  duA 
\  dxj  **"  dxi ) 


(6.24) 


is  the  strain  rate  tensor  of  the  filtered  field,  and  ut  is  termed  the  eddy  viscosity.  Models  of 
this  type  are  commonly  referred  to  as  eddy  viscosity  models. 


Substituting  the  Boussinesq  approximation  (6.23)  into  the  filtered  momentum  equation 


(6.21)  gives 

dUi  d  . _ .  3P  n  d  . 

at  +  dxj  (UlUj^  ~  dxi+2 dxj  ^  +  ^Sij} ’ 

(6.25) 

where 

P  =  P-  \ru, 

(6.26) 

and,  as  before,  p  is  the  filtered  kinematic  pressure.  From  the  incompressibility  condition, 
(6.25)  can  be  rewritten  as 


dui 

dt 


+  4r\uiui)  =  -^-.  +  ^  +  +  2fr5b" 


dxj 


(6.27) 
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It  is  seen  that,  with  the  exception  of  the  last  term  on  the  right-hand  side,  (6.27)  is  identical 
in  form  to  the  laminar  momentum  equation  (4.1). 

From  dimensional  analysis,  ut  should  be  proportional  to  a  length  scale  times  a  velocity 
scale: 

vt~t-qs9s-  (6-28) 

The  most  active  resolved  scales  are  those  very  near  the  filter  width,  consequently  the  natural 
length  scale  for  LES  is  the  filter  width  A.  Smagorinski’s  model,  based  on  a  mixing  length 
hypothesis,  assumes  i/t  to  be  proportional  to  the  filter  width  and  the  resolved  shear.  Using 
dimensional  analysis,  Smagorinski  arrived  at  the  following  SGS  model: 

vt  =  Cs2A2\S\,  (6.29) 

where  x 

|5|  =  (23^) 5 

is  the  resolved  local  strain  rate,  A  is  the  filter  width,  and  Cs  is 
typically  rsj  0.2 . 

As  stated  previously,  the  Smagorinski  model  is  by  far  the  most  commonly  used,  and  has 
provided  reasonable  results  in  countless  studies.  It  is  not  the  general  model  that  is  hoped 
for,  though,  because  adjustments  must  be  made  with  differing  geometries  and  types  of  flow. 
In  1966,  Lilly  (Lilly,  1995)  evaluated  Cs  analytically  in  terms  of  integrals  of  the  velocity 
correlation  function.  Assuming  a  2nd  order  central  discretization,  he  obtained 

Cs  ~  0.23.  (6.31) 

In  1967,  however,  Lilly  (Lilly,  1967)  re-evaluated  Cs  by  assuming  the  existence  of  a  pre¬ 
described  inertial  range  spectrum  and  then  approximately  evaluating  jSj  by  integrating  the 
spectrum  over  all  resolved  wavenumbers.  Using  this  approach  he  arrived  at 

Cs  ~  0.18.  (6.32) 

Deardorff  published  one  of  the  earliest  applications  of  Smagorinski’s  model  to  LES  in  1970 
(Deardorff,  1970).  He  found  that  the  Cs  of  Lilly  (Lilly,  1995)  was  too  diffusive  and  concluded 
that 

Cs~0.1  (6.33) 

was  preferred  for  plane  channel  flow.  In  1980,  McMillan  et  al.  (McMillan  et  al.,  1980)  found 
similar  results  for  homogeneous  shear  flows.  In  more  recent  channel  flow  LES  studies  (Ferziger 
and  Peric,  1999),  where  the  scales  were  much  better  resolved  than  in  Deardorff’s  study,  a 
value  of 

Cs  ~  0.065  (6.34) 

was  optimal  for  the  bulk  of  the  flow. 


(6.30) 

the  Smagorinski  constant, 
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To  obtain  proper  behavior  near  the  wall,  rather  arbitrary  damping  of  the  eddy  viscosity 
must  be  introduced.  The  eddy  viscosity  must  also  be  artificially  reduced  when  stratification 
and/or  rotational  effects  are  present.  Very  near  the  wall,  Cs  should  tend  toward  zero.  To 
accomplish  this,  a  van  Driest  damping  function  is  often  applied: 


C,=c„( 


(6.35) 


where 

+  =  'nV^Jp 

V  u  ’ 


(6.36) 


and  77  is  the  normal  distance  from  the  wall,  tw  is  the  wall  shear  stress,  CSo  is  the  Smagorinski 
constant  away  from  the  wall,  and  A+  is  a  constant  ~  25. 


Clark  et  al.  (Clark  et  al. ,  1979)  studied  LES  applied  to  the  decay  of  homogeneous, 
isotropic  turbulence.  In  order  to  assess  the  performance  of  the  Smagorinski  model,  the 
velocity  field  obtained  from  DNS  was  filtered.  From  this,  the  “exact”  SGS  stresses  were 
calculated  and  compared  to  the  modeled  ones.  It  was  found  that  the  stresses  predicted  with 
the  Smagorinski  model  do  not  correlate  well  with  the  exact.  It  was  found,  however,  that  the 
volume-averaged  SGS  dissipation  is  fairly  accurate.  McMillan  et  al.  (McMillan  et  ah,  1980) 
performed  a  comparable  study  using  homogeneous  shear  flows  and  found  similar  results. 

In  another  set  of  studies  assessing  the  performance  of  the  Smagorinski  model,  Piomelli 
et  al.  (Piomelli  et  ah,  1990)  investigated  the  model  applied  to  transition  in  the  boundary 
layer  of  planar  channel  flow.  It  was  found  that  perturbations  in  the  boundary  layer  were 
overdamped  at  the  early  stages  of  transition,  sometimes  leading  to  relaminarization  even  at 
super-critical  Reynolds  numbers. 


Despite  the  rather  long  list  of  problems  with  the  Smagorinski  model,  it  remains  popular 
and  is  successful  at  predicting  a  number  of  flows.  There  are  two  main  reasons  for  this  success. 
The  first  is  that  the  model  does  predict  the  integrated  energy  transfer  from  the  large  scales 
to  the  small  well,  even  though  there  is  no  means  for  backscatter  effects.  The  second  is  that 
a  number  of  ad  hoc  corrections  have  been  introduced  in  the  near-wall  region  and  in  the 
transition  to  turbulence. 


6.3.2  Other  SGS  Models 

Though  the  Smagorinski  model  is  the  most  commonly  used,  its  many  drawbacks  have 
prompted  intense  research  in  the  area  of  SGS  models.  Four  of  the  more  popular  models 
will  now  be  discussed.  For  further  reading  on  the  advancement  of  SGS  models,  the  author 
recommends  the  review  article  by  Lesieur  and  Metais  (Lesieur  and  Metais,  1996). 
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6. 3. 2.1  Kraichnan’s  Spectral  Eddy  Viscosity  Model 

Kraichnan’s  spectral  eddy  viscosity  model  (Kraichnan,  1976;  Lesieur,  1990)  utilizes  a 
sharp  Fourier  cutoff  filter  function.  The  eddy  viscosity  relation  is 

^(fc,  fcc)  =  0.441Cfc-2  E  y* ?  (6.37) 

where  E  =  E(kc)  is  the  kinetic  energy  spectrum  at  the  cutoff  frequency  kc,  and  =  vf(k/kc) 
is  a  non-dimensional  eddy  viscosity.  An  advantage  of  this  model  is  that  all  backscatter  effects 
are  included.  Though  this  model  is  based  on  an  isotropic  turbulence  assumption,  satisfactory 
results  are  obtained  even  if  the  large  scales  are  neither  isotropic  nor  homogeneous  (Metais 
and  Lesieur,  1989;  Batchelor  et  al.,  1992).  The  main  drawback  to  this  model  is  that  it  is 
defined  in  Fourier  space,  so  employing  it  on  complex  geometries  that  necessitate  one  to  work 
in  physical  space  is  difficult. 

6. 3. 2. 2  Structure-Function  Model 

Using  the  structure-function  model  presented  by  Metais  and  Lesieur  (Metais  and  Lesieur, 
1992),  one  works  in  physical  space  with  the  cutoff  frequency  defined  as  kc  =  ir/ Ax.  Assuming 
that  kc  is  in  the  inertial  region  of  the  Kolmogorov  spectrum,  from  spectral  eddy  viscosity 
models  Metais  and  Lesieur  arrived  at 

1 

vt(x,  Ax)  =  | Ck~%  ^  ,  (6-38) 

where  Ex  =  Ex(kc )  is  the  local  kinetic  energy  spectrum,  and  Ck  —  1.5  is  the  universal 
Kolmogorov  constant.  One  advantage  of  this  model  is  that  it  takes  into  account  the  local 
intermittency  of  turbulence  and  reduces  the  eddy  viscosity  in  regions  where  small-scale  tur¬ 
bulence  has  not  developed.  It  works  especially  well  for  isotropic  turbulence,  where  it  results 
in  a  Kolmogorov  spectrum  for  the  subgrid  scales.  Like  the  Smagorinski  model,  though,  it  is 
too  dissipative  for  low  Mach  number  boundary  layer  transition  in  channel  flow. 

6. 3. 2. 3  Scale  Similarity  Model 

The  previously  discussed  models  are  all  eddy  viscosity  models,  and  as  such  are  based  on 
the  assumption  of  a  one-to-one  correlation  between  the  SGS  Reynolds  stress  and  the  resolved 
strain.  As  mentioned  in  regard  to  the  Smagorinski  model,  analysis  of  DNS  and  experiments 
have  shown  that  very  little  correlation  actually  exists.  The  scale  similarity  model  is  based  on 
the  idea  that  the  important  interactions  between  the  subgrid  and  resolved  scales  primarily 
entail  the  largest  eddies  of  the  subgrid  scales  and  the  smallest  eddies  of  the  resolved  scales. 
Using  this  idea,  Bardina  et  al.  (Bardina  et  al.,  1980)  used  the  small  scales  of  the  resolved 
field  to  determine  the  model  for  the  unresolved  scales.  The  SGS  Reynolds  stress  tensor  is 
typically  modeled  as 

Tij  =  UiUj  -  UiUj,  (6.39) 
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where  the  double  overbar  denotes  that  the  quantity  has  been  filtered  twice.  This  model  does 
not  dissipate  much  energy,  consequently  it  is  typically  combined  with  the  Smagorinski  model 
in  order  to  be  of  use.  The  resulting  mixed  model  has  been  used  with  considerable  success. 

6. 3. 2.4  Dynamic  Models 

The  final  class  of  models  discussed  here  is  termed  dynamic  models,  and  was  first  intro¬ 
duced  by  Germano  et  al.  (Germano  et  al.,  1991).  These  models  dynamically  compute  the 
model  coefficients  as  the  simulation  progresses  based  on  the  energy  contained  in  the  smallest 
resolved  scales,  rather  than  imposing  them  in  advance.  A  double-filtering  approach  is  used  to 
calculate  the  SGS  Reynolds  stress  tensor.  In  addition  to  the  grid  filter,  a  test  filter  of  larger 
width,  aAx  with  a  >  1,  is  applied.  By  explicitly  calculating  the  transfers  across  the  test 
cutoff,  an  approximation  to  the  transfer  across  the  true  cutoff  kc  can  be  made.  Though  it 
is  wrought  with  mathematical  inconsistencies  (Lesieur  and  Metais,  1996),  this  approach  has 
produced  many  results  with  superior  agreement  to  DNS  and  experiments.  The  reasons  for 
this  are  numerous.  The  SGS  Reynolds  stresses  given  by  dynamic  models  correctly  vanish  in 
laminar  flows  and  at  solid  boundaries.  In  addition,  correct  asymptotic  behavior  in  the  near 
wall  region  is  produced.  Furthermore,  the  model  provides  backscatter  mechanisms.  Moin  et 
al.  (Moin  et  al.,  1991)  extended  the  model  to  compressible  flows,  with  much  success.  Zang 
and  Piomelli  (Zang  and  Piomelli,  1993)  and  Piomelli  (Piomelli,  1993)  simulated  transition  in 
plane  channel  flow,  and  produced  results  that  compare  very  well  with  DNS.  This  model  has 
even  produced  results  for  rotating  turbulence  (Squires  and  Piomelli,  1994)  that  agree  very 
well  with  experiments. 


6.4  Eddy  Viscosity  Implementation 

6.4.1  Nodal  Matrix  Equation 

Extension  of  the  SFE3D  solver  to  perform  LES  is  rather  straightforward.  Considering 
the  flowfield  variables  discussed  in  Chapter  4  to  now  be  the  filtered  variables,  the  addition  of 
two  terms, 


„,VV  +  2f^,, 


(6.40) 


to  the  right-hand-side  of  the  governing  momentum  equation  is  required.  These  terms  are 
treated  in  the  same  fashion  as  the  convective  terms.  That  is,  explicit  second-order  Adams- 
Bashforth  temporal  treatment  using  a  pseudo-spectral  approach  (see  Section  4.5).  The  dis¬ 
cretization  procedure  follows  that  outlined  in  Chapter  4.  First,  the  eddy  viscosity  terms 
are  written  in  a  form  that  separates  the  in-plane  and  transverse  components.  Utilizing  the 
in-plane  gradient  operator  V  introduced  previously  and  tensor  notation  where  the  in-plane 
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coordinate  indices  are  p  =  (1,2)  and  q  =  (1,2),  (6.40)  can  be  rewritten  as 


~  ,  d2uv  du.  ( du„  duQ\  du,  ( duv  dw 

(a^  +  dTr)  +  aT  [lh  +  dTr 


(6.41) 


-  o  d2w  dut  ( dw  duq 
vtV  w  +  ut-Q^+faT  [faT  +  aJ 


du  dw 
+  2 — -  — 
dz  dz 


(6,42) 


for  the  in-plane  and  transverse  momentum  equations,  respectively.  Note  that  the  underbraced 
terms  in  the  above  expressions  have  the  same  form  as  the  molecular  diffusion  terms.  It  is 
important  to  note,  however,  that  unlike  the  molecular  viscosity,  the  eddy  viscosity  is  not 
constant. 

In  terms  of  the  nodal  matrix  equation  presented  for  the  laminar  solver  (4.33),  these 
additional  terms  result  in 


Kij  0 

0  Kt, 


0  0  Kij  0  0 

0  0  0  K^  0 

0  0  0  0  K^ 
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0  A2ij  qSij 
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$s(8uk)j 
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$$(8Wk)j 
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-  w* + \  ({^K + {^o  -  $  frc + w)  ■  (6-43) 


where  the  previously  introduced  terms  are  given  in  Chapter4.  The  new  turbulent  diffusion 
load  vector  is 


(6.44) 


where 


f(KtXn)i' 

/  yr  \  _  (K-tyn)i 

1  ^tn  \  \  nr  \  ■ 

*•  J  i  \l^tzn)i 

<  ( Ktpn)i  j 


(6.45) 


Performing  a  SUPG/PSPG  semi-discretization  (as  in  Chapter  4)  of  the  new  terms  (6.40) 
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gives 


SUPG 


(6.46) 


(6.47) 


(6.48) 
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(6.49) 


where  u  is  the  in-plane  velocity  vector  and  V  the  in-plane  gradient  operator.  In  the  pre¬ 
vious  expressions,  the  SUPG/PSPG  terms  have  been  identified  via  an  underbrace  and  the 
underlined  terms  vanish  for  a  PI  element  approximation. 

As  before,  the  transverse  derivatives  ( d/dz)n  and  (<92/ dz2)n  are  approximated  using 
second  order  central  finite-differences.  The  values  are  calculated  via  (4.84)  and  (4.85),  re¬ 
spectively,  and  are  then  treated  as  any  other  nodal  DOF  in  computing  the  weighted  residual 
form  of  the  eddy  viscosity  terms. 

Finally,  the  SUPG/PSPG  discretization  is  completed  by  considering  each  unknown  (p  as 


<P  —  'y  )  Nj  <pj . 
j 


(6.50) 


The  SUPG  terms  in  (6.46)-(6.48)  are  neglected  in  this  work.  The  motivation  for  this  is  that 
these  terms  require  a  summation  over  four  indices,  which  is  a  computationally  expensive 
procedure,  especially  considering  that  the  terms  are  small  in  comparison  to  the  others.  The 
last  two  integrals  in  (6.49)  are  also  neglected  because  they  are  small  yet  expensive  to  compute. 
Testing  via  application  to  the  case  of  flow  past  a  circular  cylinder  at  Re  =  3900  showed  no 
apparent  stability  issues  as  a  result  of  omitting  these  stabilization  terms.  The  expressions  for 
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the  turbulent  diffusion  terms  now  become 


PSPG 


(6.51) 


(6.52) 


(6.53) 


(6.54) 


As  before,  in  the  above  expressions  the  underlined  terms  vanish  for  PI  element  approxima¬ 
tions,  and  repeated  subscripts  indicate  summation. 


6.4.2  Analytical  Coefficient  Evaluation 


Because  linear  triangle  elements  are  used,  each  of  the  coefficients  appearing  in  the  tur¬ 
bulent  diffusion  load  vector  (6.45)  can  be  easily  evaluated  analytically.  After  integration  of 
(6.51)-(6.54)  one  obtains 
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(6.56) 
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(6.58) 


6.4.3  Smagorinski  Model 

The  Smagorinski  model  was  discussed  in  some  detail  in  Section  6.3.1.  In  this  approach, 
the  eddy  viscosity  at  node  £  of  2D  plane  n  is  modeled  as 

vtnl  =  Cs2A2ne\S\n£  (6.59) 
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where  Cs  is  the  Smagorinski  constant,  Ant  is  the  filter  width,  and  \S\ne  is  the  strain  rate  of 
the  resolved  scales, 

\S\ne  =  (2 SijSij)^.  (6.60) 

Expanding  the  summations  in  |,S|n*  gives 


Substituting  this  into  (6.61)  and  rearranging  gives 


(6.62) 


The  ^-derivatives  {d/dz)nl  are  calculated  via  the  FD  expression  (4.84).  The  in-plane 
derivatives  at  node  t  are  approximated  as  the  average  of  the  derivatives  evaluated  within 
each  element  associated  with  the  node: 

m  i  on i 

9f]  1  dNj 

9y)t  ira,  A.  ^  ay  dm,t 

where  EPN,  is  the  number  of  elements  associated  with  node  L 
For  PI  elements, 


(6.66) 
(6.67) 
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dNj  _  xnJ 
dx  2  St 

dNj  _  yrij 
~dy  ~  2SV’ 


(6.64) 

(6.65) 


where  xrij  and  yrij  are  the  x-  and  y-components  of  the  element  inward  normal  vector  at  side 
j,  respectively.  Expressions  (6.64)  and  (6.65)  then  become 
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Deardorff  (Deardorff,  1970),  Moin  and  Kim  (Moin  and  Kim,  1982),  and  others  suggest 
the  filter  width  A  be  the  characteristic  length 

A  =  (A1A2A3)^,  (6.70) 

where  A*  is  the  element  length  in  the  i  direction.  The  triangular  prism  elements  used  in  this 
solver  do  not  lend  themselves  well  to  this  definition.  Taking  inspiration  from  (6.70),  however, 
in  this  work  the  filter  width  A  is  taken  to  be 

A  =  143,  (6.71) 


where  Ve  is  the  volume  of  the  element. 


6.5  Near- Wall  Treatment 


Wall  functions  are  not  implemented  into  the  SFELES  solver.  Instead,  the  grid  must  be 
refined  sufficiently  to  resolve  the  near- wall  gradients.  This  requires  that  the  first  grid  point 
be  located  at  a  distance  y+  <  2  from  the  wall,  and  the  streamwise  and  lateral  grid  spacing 
be  of  order  Ax+  ~  50  -  150  and  A z+  ~  15  -  40,  respectively.  In  the  previous  expressions, 
the  distances  are  given  in  viscous  wall  units: 


+  V\/t^Tp 

T}+  =  -, 


(6.72) 


where  tw  is  the  shear  stress  at  the  wall.  Certainly  then,  this  approach  cannot  be  used  at 
extremely  high  Reynolds  numbers  because  as  the  Reynolds  number  is  increased,  an  increasing 
number  of  grid  points  must  be  employed. 


As  mentioned  in  Section  6.3.1,  the  Smagorinski  constant  must  be  reduced  near  solid 
walls.  A  van  Driest  damping  function  is  used  in  SFELES  to  accomplish  this.  That  is,  the 
Smagorinski  model  (6.29)  becomes 


Vi  =  (C,/„)2A2|S|,  (6.73) 

where  the  damping  function  is 

U  =  l-e-y+/A+  (6.74) 
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and  y+  is  the  normal  distance  to  the  nearest  solid  wall  and  A+  =  25. 

Evaluating  the  distance  between  a  node  and  the  nearest  wall  is  accomplished  by  the 
straightforward  approach  of  looping  over  all  boundary  segments  for  each  node  and  deter¬ 
mining  the  smallest  distance.  This  is  a  somewhat  expensive  process,  but  it  need  only  be 
performed  once  at  the  beginning  of  a  simulation. 

6.6  Aliasing  Errors 


When  a  pseudo-spectral  approach  is  used  for  the  nonlinear  terms,  aliasing  errors  are 
introduced,  especially  in  convection-dominated  flows.  These  errors  occur  any  time  nonlinear 
terms  are  evaluated  approximately  in  physical  space.  For  instance,  consider  the  ID  case 


s(k)  =  a(k)b(k), 

(6.75) 

where 

a(k )  =  T[a{x)\ 

(6.76) 

b(k )  =  T[b{x)\. 

(6.77) 

If  a  and  b  have  harmonic  components  up  to  N,  where  N 

is  the  number  of  Fourier  modes 

in  the  DFT,  the  product  ab  has  harmonic  components  up  to  2N.  The  aliasing  errors  come 
about  because  the  DFT  with  N  points  cannot  distinguish  between  wavenumbers  that  are 
multiples  of  N. 

Much  effort  has  been  expended  in  determining  ways  to  control  or  remove  aliasing  errors 
in  pseudo-spectral  approaches  to  solving  the  NS  equations  (Canuto  et  al.,  1988;  Orzag,  1971b; 
Patterson  and  Orzag,  1971).  The  process  of  removing  aliasing  errors  is  termed  dealiasing. 
The  approach  that  has  become  standard  in  spectral  methods  for  DNS  and  LES  is  |-dealiasing. 
The  key  to  this  technique  is  to  use  a  DFT  with  at  least  |iV  points,  where  N  is  again  the 
number  of  Fourier  modes.  That  is,  because  the  nonlinear  interactions  generate  modes  up  to 
double  the  original  cutoff  frequency,  aliasing  can  be  eliminated  by  evaluating  the  convective 
terms  at  more  nodes  and  only  keeping  the  lowest  N  modes  from  the  DFT.  It  has  been  shown 
(Orzag,  1971b)  that  a  minimum  of  |Af  nodes  is  required  for  dealiasing.  This  approach  comes 
at  a  substantial  cost  because  the  evaluation  of  the  convective  terms,  which  can  take  up  to 
25—30%  of  the  overall  computation  time,  is  increased  by  50%.  In  addition,  the  number  of 
discrete  points  seen  by  the  DFT  is  no  longer  a  power  of  two.  Although  fast  DFTs  exist  for 
this  case,  they  are  not  as  efficient  as  those  developed  for  2n  points  (Press  et  al.,  1992). 

Dealiasing  can  be  avoided  if  the  aliasing  errors  are  small  compared  to  the  truncation 
errors  and  the  SGS  terms.  For  the  divergence  and  convective  forms  of  the  convective  terms, 
aliased  spectral  methods  are  not  energy  conserving  and  can  be  unstable.  It  has  been  shown, 
however,  that  if  the  skew-symmetric  or  rotational  forms  are  used,  spectral  methods  are  energy 
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conserving  even  in  the  presence  of  aliasing  errors  (Zang,  1991;  Kravchenko  and  Moin,  1997).  It 
has  also  been  shown  that  when  the  skew-symmetric  form  is  used,  the  results  are  well-behaved 
without  dealiasing  and  that  the  aliasing  errors,  especially  at  the  higher  wavenumbers,  are 
significantly  reduced  (Kravchenko  and  Moin,  1997). 

To  address  aliasing  errors  in  the  SFELES  solver,  the  skew-symmetric  form  of  the  con¬ 
vective  terms  is  used.  This  form  is  an  average  of  the  convective  and  divergence  forms,  or 

4[V  ■  (uu)  +  u  ■  Vit].  (6.78) 

Rearranging  this  expression  gives 

u-Vu  +  \u{V  -u).  (6.79) 

The  first  term  in  the  above  expression  is  the  convective  form  used  in  the  previous  discussions 

of  the  FE/spectral  discretization.  Implementing  the  skew-symmetric  form  simply  requires 
the  term 

\u{V  ■  u)  (6.80) 

to  be  added  to  the  convective  load  vector  j-  in  (6.43).  In  physical  space,  the  following 
terms  are  added  to  in  (4-91) 

(ssxn)i  =  xrij  +  Vj  yrij)  +  max[l,  2(5y  +  Su  +  5j()]  (6.81) 

(ssyn)i  =  ij  xnj  +  Vj  y7ij )  +  ~vt  max[l,  2{5ij  +  5a  +  fy)]  (6.82) 

(sszn)i  =  1  5l--we(uj  Xn3  +  v3  Vn3 )  +  i^z)  “““I1’  2(<^'  +  +  6jt)\  (6.83) 

(sspn)i  =  0.  (6.84) 

SUPG/PSPG  stabilization  is  neglected  for  the  skew-symmetric  terms  because  it  requires  a 
summation  over  four  indices,  which  is  a  computationally  expensive  procedure  considering 
that  the  terms  are  very  small  in  comparison  to  the  others. 
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Fig.  6.3  -  Filtering 
sharp  Fo 
the  filter 


Fig.  6.4  -  Attenuation  factors  of  the  three  most  common  filter  functions. 
Gaussian, . sharp  Fourier  cutoff. 


tophat, 
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7.  LARGE-EDDY  SIMULATION  TEST  CASES 


A  validation  test  case  of  flow  past  a  circular  cylinder  at  Re  =  3900  is  presented  in  this 
chapter.  Results  in  terms  of  qualitative  behavior  and  quantitative  velocity  and  turbulence 
values  are  analyzed.  The  SFELES  solver  is  then  used  to  investigate  the  effects  of  a  wake 
splitter  plate  attached  to  a  cylinder.  Results  are  presented  for  multiple  splitter  plate  lengths 
at  Re  =  3900. 

7.1  Circular  Cylinder 

7.1.1  Problem  Description 

Discussions  of  laminar  circular  cylinder  flow  have  been  presented  in  Sections  3.3  and  5.1. 
According  to  Prasad  and  Williamson  (Prasad  and  Williamson,  1997),  the  onset  of  turbulence 
occurs  at  Re  —  1200;  it  is  at  this  Reynolds  number  that  the  shear  layers  become  unstable. 
However,  there  is  no  firm  consensus  on  the  value  of  this  critical  Reynolds  number— other 
studies  have  stated  it  to  be  anywhere  from  300  to  3000.  In  this  study,  Re  =  3900  is  chosen 
because  it  is  definitely  above  the  critical  Reynolds  number  and  yet  low  enough  such  that  an 
exorbitant  number  of  nodes  are  not  required  near  the  solid  walls.  In  addition,  this  has  been 
a  popular  Reynolds  number  for  studies  of  this  kind,  and  PIV  and  hot  wire  data  is  available 
from  Lourenco  and  Shih  (Lourenco  and  Shih,  1993)  and  Ong  and  Wallace  (Ong  and  Wallace, 
1996),  respectively.  Many  numerical  simulations  have  also  been  performed  at  this  Reynolds 
number.  These  include  Beudan  and  Moin  (Beudan  and  Moin,  1994),  who  used  a  high-order 
upwind  FD  scheme,  Mittal  and  Moin  (Mitall  and  Moin,  1997),  who  used  a  combined  2nd- 
order  central  FD/spectral  scheme,  and  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998), 
using  a  B-spline  method.  A  number  of  important  pieces  of  information  have  been  learned 
from  these  studies.  The  first  is  that  energy-conserving  schemes  are  more  suitable  for  these 
simulations.  Also,  the  dissipation  due  to  truncation  errors  can  become  more  substantial  than 
that  of  the  SGS  model  in  some  situations  when  using  lower-order  FD  schemes.  Though  the 
numerical  simulations  produced  results  in  good  agreement  with  the  experiments  in  the  near 
wake  ( x/D  <  4),  excessive  numerical  dissipation  caused  under-prediction  of  the  turbulent 
fluctuations  farther  downstream.  This  numerical  dissipation  was  most  pronounced  in  the 
upwind  FD  simulations. 

This  test  case  serves  to  validate  the  SFELES  solver,  at  least  at  this  relatively  low 
Reynolds  number.  The  primary  goal  is  not  to  provide  new  information  regarding  the  physics 
of  the  flow,  but  rather  to  present  the  results  of  this  work,  both  of  qualitative  and  quantitative 
nature,  in  comparison  to  published  values. 
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7.1.2  Computational  Mesh 


A  perspective  view  of  the  computational  mesh  used  for  this  study  is  shown  in  Figure  7.1. 
A  Dirichlet  velocity  inlet  condition  is  applied  5 D  upstream  of  the  cylinder  center,  while  a 
Neumann  outlet  condition  exists  20 D  downstream.  Slip- wall  conditions  are  applied  at  the 
lateral  extents  of  the  domain,  located  7 D  from  the  cylinder  center.  As  discussed  in  Section  5.2, 
a  shedding  frequency  slightly  higher  than  in  freestream  flow  is  expected  due  to  the  proximity 
of  these  lateral  walls.  The  spanwise  dimension  is  Lz/D  =  7r.  This  is  the  same  dimension 
used  in  the  numerical  simulations  mentioned  in  the  previous  section  (Beudan  and  Moin,  1994; 
Mitall  and  Moin,  1997;  Kravchenko  and  Moin,  1998)  and  was  found  in  these  studies  to  be 
sufficiently  large. 


Fig.  7.1  -  Turbulent  circular  cylinder  flow:  overview  of  the  computational  domain  and  mesh 
containing  23,500  in-plane  triangle  elements  (12,000  in-plane  nodes)  and  32  Fourier 
modes. 

In  the  2D  plane  the  mesh  consists  of  23,500  triangle  elements  with  approximately  12,000 
nodes.  As  seen  in  Figure  7.1,  nodes  are  clustered  in  the  wake  region,  with  characteristic 
element  side  lengths  of  0.10D  in  the  near  wake  and  0.19D  farther  downstream.  Outside  the 
wake  region  the  mesh  is  more  coarse,  with  the  characteristic  element  side  length  being  0.5 D 
at  the  upstream  and  lateral  domain  extents.  Elements  are  also  clustered  toward  the  cylinder 
in  order  to  better  capture  the  initial  development  of  the  turbulent  wake,  as  illustrated  in 
Figure  7.2.  Boundary  layer  wedge- type  elements  are  used  very  near  the  cylinder  walls,  with 
the  perpendicular  dimension  of  the  first  element  off  the  cylinder  being  0.0025D.  A  posteriori 
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analysis  shows  the  first  node  being  located  at  y+  =  1.8,  and  around  5  nodes  inside  the 
boundary  layer  at  6  =  45°  (with  0°  being  at  y  =  0  on  the  upstream  side  of  the  cylinder). 
There  exist  110  nodes  on  the  cylinder  wall,  resulting  in  an  element  side  length  of  0.0275 D. 
The  mesh  in  this  region  is  shown  in  Figure  7.3(a). 


Fig.  7.2  -  Turbulent  circular  cylinder  flow:  view  of  the  2D  mesh  in  the  near- wake  region. 


According  to  the  studies  by  Mansy  et  al  (Mansy  et  al.,  1994)  and  Williamson  et  al 
(Williamson  et  al.,  1995),  the  spanwise  wavelength  of  the  fluid  structures  near  the  cylinder 
scale  as 

A  Z/D  =  aRe-1'2,  (7.1) 

where  a  ~  20—25.  At  Re  =  3900  the  structures  are  expected  to  have  wavelengths  A z/ D  «  0.4. 
Using  a  spectral  discretization  in  the  transverse  direction,  at  least  two  points  are  required  per 
wavelength.  Consequently,  with  L2/Z?  =  ir  a  minimum  of  16  transverse  points  are  required. 
In  this  study,  32  points  were  used  in  order  to  ensure  sufficient  resolution.  The  overall  mesh 
contains  approximately  380,000  nodes. 

A  refined  mesh  was  also  generated  to  verify  sufficient  spatial  resolution.  This  mesh 
contains  43,000  in-plane  elements  with  approximately  22,000  nodes.  Characteristic  element 
side  lengths  in  the  wake  region  range  from  0.071Z?  near  the  cylinder  to  0.22 D  near  the  outlet. 
The  perpendicular  dimension  of  the  first  wedge-type  element  off  the  cylinder  is  0.0015Z?, 
and  the  typical  element  side  length  along  the  cylinder  wall  is  0.0171?.  A  posteriori  analysis 
shows  the  first  node  off  the  cylinder  being  located  at  y+  =  1.0  and  about  8  nodes  inside 
the  boundary  layer  at  6  =  45°.  The  mesh  in  this  region  is  shown  in  Figure  7.3(b).  No 
refinement  was  made  in  the  transverse  direction  because  32  planes  are  already  sufficient  to 
resolve  wavelengths  half  the  size  expected.  The  refined  mesh  contains  approximately  700,000 
nodes. 
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Fig.  7.3  -  Turbulent  circular  cylinder  flow:  view  of  the  2D  mesh  very  near  the  cylinder, 
showing  the  wedge-type  boundary  layer  elements  for  the  (a)  coarse  and  (b)  fine 
meshes. 


7.1.3  Results 

Both  the  coarse  and  fine  mesh  simulations  were  initialized  using  potential  flow.  To 
decrease  computation  time,  the  solution  was  allowed  to  develop  as  a  2D  flow  until  the  Karman 
vortex  street  was  fully  developed.  To  hasten  the  development  of  asymmetric  shedding,  a  time- 
dependent  rotation  was  applied  to  the  cylinder  at  the  beginning  of  the  simulation  for  a  short 
time;  first  in  one  direction  and  then  the  opposite.  After  the  2D  Karman  vortex  street  was 
sufficiently  developed,  a  small  perturbation  was  applied  at  the  inlet  boundary  for  a  short  time 
to  initiate  the  3D  features  of  the  flow.  The  simulation  was  then  allowed  to  advance  until  a 
statistically  steady  vortex  shedding  developed.  Finally,  the  simulation  was  run  an  additional 
six  shedding  cycles,  TU^/D  «  30,  to  allow  calculation  of  the  flow  statistics.  The  time  step 
throughout  the  simulation  was  fixed  at  AtU^/D  =  0.001.  No  turbulent  fluctuations  are 
introduced  at  the  inlet.  Because  Re  =  3900  is  in  the  subcritical  Reynolds  number  regime 
(Re  <  100,000),  the  free-stream  flow  and  cylinder  boundary  layers  are  laminar — turbulence 
develops  only  in  the  shear  layers  and  wake  region. 

An  instantaneous  perspective  view  of  the  resulting  flowfield  is  shown  in  Figure  7.4  via 
surfaces  of  constant  vorticity  magnitude.  A  closer  view  of  the  vorticity  iso-surfaces  is  shown 
in  Figure  7.5.  The  long  shear  layers  attached  to  the  cylinder  are  seen  to  roll  up  into  the 
common  Karman  vortex  street.  The  length  of  the  recirculation  region  behind  the  cylinder 
is  predicted  to  extend  approximately  ID  downstream  of  the  cylinder,  which  is  in  agreement 
with  experimental  observations  (Williamson,  1996b).  The  surfaces  in  Figures  7.4  and  7.5  are 
colored  according  to  velocity  magnitude,  with  blue  representing  low  values  and  red  represent¬ 
ing  high  values.  The  fluid  inside  the  recirculation  region  is  slow  moving,  but  contains  small 
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scale  turbulent  structures.  Figure  7.6  shows  a  bottom  view  of  the  vorticity  magnitude  iso¬ 
surfaces.  In  this  figure  one  sees  the  separating  shear  layers  roll  into  vortices  with  diameters 
on  the  order  of  ID ,  which  is  confirmed  in  experimental  observations  by  Chyu  and  Rockwell 
(Chyu  and  Rockwell,  1996)  and  Prasad  and  Williamson  (Prasad  and  Williamson,  1997)  at 
this  Reynolds  number. 


Fig.  7.4  -  Turbulent  circular  cylinder  flow:  perspective  view  of  instantaneous  surfaces  of 
constant  vorticity  magnitude,  uD/U0 0  =  2.5.  The  surfaces  are  colored  according 
to  velocity  magnitude,  with  blue  representing  low  values  and  red  representing  high 
values. 

Figures  7.7-7.9  show  contours  of  constant  streamwise,  lateral,  and  transverse  velocity, 
respectively,  on  the  mid-plane  of  the  channel  (y  =  0  plane).  From  the  streamwise  velocity 
contours,  the  extent  of  the  recirculation  region  is  easily  seen,  along  with  the  high  levels  of 
turbulent  fluctuations  inside  the  region.  Figure  7.8  shows  alternating  regions  of  positive  and 
negative  lateral  velocity  in  the  streamwise  direction,  highlighting  the  Karman  vortex  street. 
These  figures,  particularly  the  transverse  velocity  contours,  show  that  small  scale  structures 
are  dominant  near  the  cylinder,  with  larger  scales  dominating  downstream.  According  to 
Williamson  et  al  (Williamson  et  al.,  1995)  and  Chyu  and  Rockwell  (Chyu  and  Rockwell,  1996), 
the  fluid  structures  far  downstream  of  the  cylinder  have  experimentally  observed  wavelengths 

K/D  ~  1.  (7.2) 

The  SFELES  solution  agrees  well  with  this  observation.  It  is  important  to  notice  that  small- 
scale  fluctuations  do  exist  far  downstream  in  Figures  7. 7-7.9,  though  they  are  dominated 
by  the  large  structures.  The  upwind  FD  simulations  of  Beudan  and  Moin  (Beudan  and 
Moin,  1994)  show  no  small-scale  turbulence  in  the  far  wake,  though  other  numerical  and 
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Fig.  7.5  -  Turbulent  circular  cylinder  flow:  front  view  of  instantaneous  surfaces  of  constant 
vorticity  magnitude,  uiD/U^  =  2.5,  in  the  region  —  0.5.D  <  x  <  10 D.  The  surfaces 
are  colored  according  to  velocity  magnitude,  with  blue  representing  low  values  and 
red  representing  high  values. 

experimental  studies  do.  This  emphasizes  that  the  SUPG  formulation  does  not  suffer  from 
the  overly-diffusive  nature  of  typical  upwind  schemes. 

The  final  qualitative  results  presented  illustrate  the  development  of  the  alternating  vortex 
shedding  phenomenon.  Figure  7.10  shows  vorticity  magnitude  iso-surfaces  in  the  very  near 
wake  at  five  instants  during  the  shedding  cycle.  In  the  first  frame  the  bottom  vortex,  which 
is  completely  rolled  up,  is  beginning  to  wash  downstream.  At  the  same  instant,  the  top 
vortex  is  already  formed  and  in  the  process  of  rolling  up.  In  the  second  and  third  frames 
the  top  vortex  grows  in  size,  which  draws  fluid  into  the  wake  from  the  opposite  side.  Some 
of  this  fluid  is  entrained  into  the  top  vortex,  while  some  replenishes  the  lost  fluid  in  the 
recirculation  region.  By  the  fourth  frame  the  top  vortex  is  completely  rolled  up  and  is  being 
carried  downstream,  while  the  development  of  the  bottom  vortex  is  well  underway.  Finally, 
in  the  fifth  frame  the  bottom  vortex  is  nearly  rolled  up  and  the  process  repeats.  These  results 
agree  with  the  observations  of  Gerrard  (Gerrard,  1966)  and  Perry  et  al  (Perry  et  ah,  1982) 
regarding  the  physical  mechanics  of  the  formation  region  behind  bluff  bodies. 

As  mentioned  previously,  statistical  information  was  gathered  over  six  shedding  cycles. 
Table  7.1  shows  quantitative  results  for  mean  drag  coefficient  (C dp),  minimum  mean  stream- 
wise  velocity  (f/m,n),  formation  length  of  the  recirculation  region  (L/),  location  of  the  shear 
layer  separation  points  (0sep),  and  non-dimensional  shedding  frequency  (St).  These  results 
are  compared  with  available  experimental  measurements.  With  the  exception  of  Cpp,  the 
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Fig.  7.7  ~  Turbulent  circular  cylinder  flow:  instantaneous  streamwise  velocity  contours  on 
the  y  =  0  plane  and  in  the  region  -0.5D  <  x  <  10 D.  Shown  are  16  contours  from 
— 1.2  <  m  <  1.2,  with  the  u  =  0  contour  highlighted. 

SFELES  results  are  within  the  experimental  error  bounds. 

Table  7.1  -  Turbulent  circular  cylinder  flow:  comparison  of  flow  parameters  from  SFELES 
at  Re  =  3900  with  various  experiments.  Cdp  is  from  Norberg  (Re  =  4020) 
(Norberg,  1993),  Umin  from  Lorenco  and  Shih  (Lourenco  and  Shih,  1993),  Lf/D 
from  Cardell  (Cardell,  1993),  6sep  from  Son  and  Hanratty  (Re  =  5000)  (Son  and 
Hanratty,  1969),  and  St  from  Ong  and  Wallace  (Ong  and  Wallace,  1996).  Unless 


cDp 

Lf/D 

@sep 

St 

SFELES 

Experiment 

1.09 

0.99  ±  .05 

-0.29 

-0.24  ±.l 

1.30 

1.4  ±.l 

88.0° 

86°  ±  2° 

0.2179 

0.215  ±  .005 

More  statistical  information,  in  terms  of  mean  velocities  and  turbulence  statistics,  is 
compared  with  experimental  and  numerical  studies  on  the  following  pages.  The  experimental 


Fig.  7.9  -  Turbulent  circular  cylinder  flow:  instantaneous  transverse  velocity  contours  on  the 
y  =  0  plane  and  in  the  region  —0.5 D  <  x  <  10D.  Shown  are  16  contours  from 
—1.0  <w<  1.0,  with  the  w  =  0  contour  highlighted. 

studies  entail  the  PIV  measurements  of  Lourenco  and  Shih  (Lourenco  and  Shih,  1993)  and 
the  hot  wire  measurements  of  Ong  and  Wallace  (Ong  and  Wallace,  1996).  The  numerical 
study  used  for  comparison  is  that  of  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998), 
where  a  B-spline  method  was  employed.  This  study  is  chosen  because  the  results  are  in  good 
agreement  with  experimental  measurements  as  well  as  the  numerical  results  of  Beudan  and 
Moin  (Beudan  and  Moin,  1994)  and  Mittal  and  Moin  (Mittal  and  Balachandar,  1997)  (at  least 
in  the  near  wake,  x/D  <  4),  where  second-order  upwind  and  central  FD  methods  were  used, 
respectively.  The  results  of  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998)  were  achieved 
on  a  mesh  with  approximately  1.3  million  nodes — nearly  double  the  fine-mesh  used  in  this 
work— and  extended  30 D  from  the  cylinder  in  the  2D  plane  and  had  a  spanwise  dimension 
of  7 tD.  SFELES  results  are  presented  for  both  the  coarse  and  fine  meshes.  The  profiles  used 
for  comparison  were  digitized  from  published  plots,  consequently  an  error  estimated  at  ±5% 
is  introduced. 


Figure  7.11  shows  the  mean  streamwise  velocity  along  the  y  =  0  line  in  the  wake  up  to 
x/D  =  10.  The  SFELES  results  show  better  agreement  with  the  PIV  measurements  than 
do  the  B-spline  results  in  terms  of  the  size  and  strength  of  the  recirculation  region.  In  the 
region  2.5  <  x  <  4,  the  SFELES  results  do  not  agree  with  the  PIV.  However,  the  PIV  results 
(Lourenco  and  Shih,  1993)  are  the  only  to  show  a  dip  in  the  streamwise  velocity  in  this  region. 

Mean  streamwise  and  lateral  velocity  profiles  at  three  streamwise  locations  in  the  wake 
(x/D  =  1.54,  3.00,  and  5.00)  are  shown  in  Figures  7.12  and  7.13,  respectively.  The  results 
are  in  reasonable  agreement,  with  the  largest  discrepancies  occurring  at  the  station  inside 
the  recirculation  region  at  x/D  =  1.54.  Finally,  turbulence  statistics  profiles  are  shown  at 
the  same  three  streamwise  stations  in  Figures  7.14-7.16.  Again,  the  SFELES  results  are  in 
agreement  with  the  published  values.  In  fact,  the  SFE3D  solver  generally  produces  results  in 
better  agreement  with  the  experiments  than  the  B-spline  solver  (especially  in  the  very  near 
wake). 
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Fig.  7.10  -  Turbulent  circular  cylinder  flow:  instantaneous  surfaces  of  constant  vorticity  mag¬ 
nitude,  uD/Uoo  =  2.5,  in  the  very  near  wake  at  five  instants  during  the  shedding 
cycle.  T  is  the  time  interval  of  one  complete  shedding  cycle. 


Fig.  7.11  -  Turbulent  circular  cylinder  flow:  mean  streamwise  velocity  along  the  line  y  =  0 

at  Re  =  3900.  -  SFELES  fine  grid, . SFELES  coarse  grid, . 13- 

spline  simulations  by  Kravchenko  and  Moin  (Kravchenko  and  Moin,  1998),  □ 
PIV  measurements  of  Lourenco  and  Shih  (Lourenco  and  Shih,  1993),  a  hot  wire 
measurements  of  Ong  and  Wallace  (Ong  and  Wallace,  1996). 


Fig.  7.12  -  Turbulent  circular  cylinder  flow:  mean  streamwise  velocity  at  different  streamwise 

locations  in  the  wake  at  Re  =  3900. - SFELES  fine  grid, . SFELES  coarse 

grid, - B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko  and  Moin, 

1998),  □  PIV  measurements  of  Lourenco  and  Shih  (Lourenco  and  Shih,  1993),  a 
hot  wire  measurements  of  Ong  and  Wallace  (Ong  and  Wallace,  1996). 
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Fig.  7.13  -  Turbulent  circular  cylinder  flow:  mean  lateral  velocity  at  different  streamwise 

locations  in  the  wake  at  Re  =  3900.  -  SFELES  fine  grid, . SFELES 

coarse  grid,  - - B-spline  simulations  by  Kravchenko  and  Moin  (Kravchenko 

and  Moin,  1998),  □  PIV  measurements  of  Lourenco  and  Shih  (Lourenco  and  Shih, 
1993),  a  hot  wire  measurements  of  Ong  and  Wallace  (Ong  and  Wallace,  1996). 


Fig.  7.14  -  Turbulent  circular  cylinder  flow:  time-averaged  streamwise  velocity  fluctuations 

at  different  streamwise  locations  in  the  wake  at  Re  —  3900.  -  SFELES  fine 

grid, . SFELES  coarse  grid, - B-spline  simulations  by  Kravchenko  and 

Moin  (Kravchenko  and  Moin,  1998),  □  PIV  measurements  of  Lourenco  and  Shih 
(Lourenco  and  Shih,  1993),  a  hot  wire  measurements  of  Ong  and  Wallace  (Ong 
and  Wallace,  1996). 
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Fig.  7.15  -  Turbulent  circular  cylinder  flow:  time-averaged  lateral  velocity  fluctuations  at 

different  streamwise  locations  in  the  wake  at  Re  =  3900.  -  SFELES  fine 

grid,  SFELES  coarse  grid,  — •-  B-spline  simulations  by  Kravchenko  and 
Moin  (Kravchenko  and  Moin,  1998),  □  PIV  measurements  of  Lourenco  and  Shih 
(Lourenco  and  Shih,  1993),  a  hot  wire  measurements  of  Ong  and  Wallace  (Ong 
and  Wallace,  1996). 


y/D 


Fig.  7.16  -  Turbulent  circular  cylinder  flow:  time-averaged  Reynolds  shear  stress  at  dif¬ 
ferent  streamwise  locations  in  the  wake  at  Re  =  3900.  -  SFELES  fine 

grid, - SFELES  coarse  grid, - B-spline  simulations  by  Kravchenko  and 

Moin  (Kravchenko  and  Moin,  1998),  □  PIV  measurements  of  Lourenco  and  Shih 
(Lourenco  and  Shih,  1993),  a  hot  wire  measurements  of  Ong  and  Wallace  (Ong 
and  Wallace,  1996). 
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7.2  Circular  Cylinder  With  Wake  Splitter  Plate 

7.2.1  Problem  Description 

A  splitter  plate  attached  to  the  downstream  portion  of  a  cylinder  has  long  been  used 
as  a  means  of  passive  control  of  the  wake  flow  characteristics.  Roshko  (Roshko,  1954)  was 
the  first  to  investigate  this  configuration.  His  experiments  at  Re  =  1.45  x  104  showed  that  a 
splitter  plate  with  length  L/D  =  1.54  caused  a  reduction  in  shedding  frequency.  It  was  also 
seen  that  with  a  splitter  plate  of  L/D  =  5  the  vortex  shedding  was  completely  suppressed 
and  the  pressure  drag  was  reduced  to  63%  of  the  no  splitter  plate  case.  Gerrard  (Gerrard, 
1966)  increased  the  knowledge  base  by  investigating  circular  cylinders  with  splitter  plates 
at  Re  =  2.0  x  104.  The  Strouhal  number  was  found  to  decrease  for  splitter  plate  lengths 
0  <  L/D  <  1,  and  then  increase  until  L/D  =  2,  the  maximum  length  investigated.  Even 
small  splitter  plates,  L/D  =  0.125,  reduced  the  pressure  drag  to  83%  of  the  no  splitter  plate 
case,  and  a  maximum  drag  reduction  to  68%  was  achieved  at  L/D  =  1. 

Apelt  et  al  (Apelt  et  ah,  1973)  and  Apelt  and  West  (Apelt  and  West,  1975)  made 
significant  contributions  in  their  investigation  at  Re  =  104-5  x  104  and  splitter  plate  lengths 
up  to  L/D  =  7.  It  was  found  that  the  addition  of  a  splitter  plate  to  the  circular  cylinder 
reduced  the  pressure  drag  significantly  by  stabilizing  the  shear  layer  separation  points  and 
producing  a  narrower  wake.  In  addition,  the  base  pressure  was  raised  by  as  much  as  50% 
and  the  Strouhal  number  was  affected  to  a  lesser  degree.  The  maximum  change  in  drag 
coefficient  and  wake  width  occurred  at  L/D  k.  1,  agreeing  with  the  observations  of  Gerrard 
(Gerrard,  1966).  As  with  the  studies  of  Roshko  (Roshko,  1954),  at  L/D  >  5  the  vortex 
shedding  was  completely  suppressed.  It  was  also  noted  that  the  position  of  vortex  formation 
moves  downstream  as  L/D  increases  from  0  to  1,  with  the  location  being  very  near  the  end 
of  the  plate.  At  larger  L/D ,  the  formation  position  remains  nearly  the  same  with  respect  to 
the  cylinder,  and  moves  slightly  away  from  the  plate  in  the  lateral  direction. 

Anderson  and  Szewczyk  (Anderson  and  Szewczyk,  1997)  investigated  straight  and  sin¬ 
uous  splitter  splitter  plate  geometries  using  hot  wire  measurements  and  flow  visualization. 
This  study,  at  Re  =  3.5  x  104-4.6  x  104  shows  a  marked  increase  in  Strouhal  number  for  very 
small  splitter  plate  lengths,  L/D  <  0.25,  followed  by  a  decrease  for  0.25  <  L/D  <  0.8,  and 
an  increase  up  to  L/D  <  1.5.  In  addition,  a  drag  reduction  was  measured  with  increasing 
splitter  plate  length  up  to  L/D  «  1,  again  highlighting  the  significance  of  the  L/D  =  1 
case.  In  this  study  a  significant  decrease  in  separation  point  movement  was  seen  even  for 
very  small  L/D,  supporting  the  investigation  of  Apelt  et  al  (Apelt  et  al.,  1973).  Figure  7.17 
from  Anderson  and  Szewczyk  (Anderson  and  Szewczyk,  1997)  summarizes  the  four  splitter 
plate  regions  that  have  been  identified  and  are  described  below.  Frame  (a)  shows  the  no 
splitter  plate  {L/D  =  0)  case,  where  the  shear  layers  move  freely  in  phase  with  the  vortex 
shedding.  In  this  case  the  Strouhal  number  is  Reynolds  number  independent  at  St  ~  0.20. 
Frame  (b)  shows  Region  I,  the  Stabilizing  Region  {L/D  <  0.25).  The  presence  of  the  splitter 
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plate  stabilizes  the  separation  points,  and  a  progressive  increase  in  Strouhal  number  is  seen 
with  L/D.  The  strength  of  the  shear  layers  is  increased  in  comparison  to  the  L/D  —  0  case, 
and  consequently  their  interaction  is  increased  in  the  near  wake.  Anderson  and  Szewczyk 
(Anderson  and  Szewczyk,  1997)  term  Region  II,  shown  in  frame  (c),  the  Elongation  Region 
(0.25  <  L/D  <  0.75).  This  region  sees  a  progressive  decrease  in  Strouhal  number  with  L/D. 
It  is  characterized  by  a  large  shear  layer  interaction  length,  which  gives  rise  to  increased 
entrainment  that  reduces  the  shear  layer  interaction.  Consequently,  a  progressive  increase 
in  formation  length  is  seen  with  L/D.  Region  III,  in  frame  (d)  is  termed  the  Reduced  En¬ 
trainment  Region  (0.75  <  L/D  <  1.5).  In  this  region  the  formation  length  remains  constant, 
the  interaction  length  decreases,  and  a  progressive  increase  in  Strouhal  number  is  seen  with 
L/D.  The  final  region,  the  Splitter  Plate- Vortex  Interaction  Region  ( L/D  >  1.5)  is  shown  in 
frame  (c).  The  shear  layer  interaction  is  very  small  in  this  region  and  the  Strouhal  number 
increases  with  L/D. 

Other  experimental  studies,  for  instance  (Unal  and  Rockwell,  1987;  Hasan  and  Budair, 


(b) 


I 


diminished 

oscillation 


increased  entrainment 


Fig.  7.17  -  Summary  of  the  effects  of  a  splitter  plate  on  the  formation  region  behind  a  circu¬ 
lar  cylinder.  From  Anderson  and  Szewczyk  (Anderson  and  Szewczyk,  1997),  by 
permission. 
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1994;  Cimbala  and  Leon,  1996),  have  investigated  splitter  plates  detached  from  the  cylinder 
and  splitter  plates  that  are  allowed  to  freely  rotate.  The  details  of  these  studies  are  not 
discussed  here,  but  the  changes  to  the  flow  characteristics  were  found  to  be  similar  to  the 
attached  splitter  plate. 

Numerical  studies  of  circular  cylinders  with  splitter  plates  are  rather  limited.  Kwon  and 
Choi  (Kwon  and  Choi,  1995)  studied  laminar,  2D  flow  at  Re  =  80  -  160  using  a  FV  method. 
You  et  al  (You  et  al.,  1998)  also  used  a  FV  method  to  investigate  laminar,  2D  flow — this 
time  at  Re  =  100  and  160.  These  studies  did  not  show  the  increase  in  Strouhal  number  for 
very  small  L/D  found  by  Anderson  and  Szewczyk  (Anderson  and  Szewczyk,  1997).  They  do 
show,  however,  the  significance  of  the  L/D  =  1  case  in  terms  of  Strouhal  number  and  drag 
coefficient. 

In  this  study,  the  SFELES  solver  is  used  to  investigate  flow  at  Re  =  3900  and  splitter 
plate  lengths  of  L/D  =  0.25,  0.75,  and  1.50.  The  behavior  of  the  flowfield  in  terms  of  vortex 
formation  location,  wake  geometry,  shedding  frequency,  and  drag  coefficient  are  compared  to 
published  experimental  studies  performed  at  somewhat  higher  Reynolds  numbers. 

7.2.2  Computational  Mesh 

The  computational  domain  and  boundary  conditions  used  for  the  splitter  plate  calcula¬ 
tions  are  identical  to  the  no  splitter  plate  case  described  in  the  previous  section  and  pictured 
in  Figure  7.1.  Due  to  computational  time  restraints,  a  grid  refinement  study  was  not  feasi¬ 
ble.  Considering  the  L/D  =  0  case  of  the  previous  section,  it  was  decided  that  the  element 
dimensions  used  in  the  coarse  mesh  were  sufficient.  The  only  adjustment  made  was  to  reduce 
the  spacing  along  the  cylinder  and  splitter  plate  walls  to  0.02D  from  0.0275D.  This  refine¬ 
ment  along  the  boundary  propagates  into  the  shear  layer  and  near-wake  regions,  providing 
better  resolution.  The  reader  is  referred  to  Section  7.1.2  for  information  regarding  all  other 
element  dimensions,  including  the  boundary  layer  elements.  The  overall  mesh  statistics  for 
the  three  splitter  plate  cases  as  well  as  the  L/D  =  0  coarse  grid  case  of  the  previous  section 
are  given  in  Table  7.2.  The  increased  resolution  along  the  cylinder  and  splitter  plate  results 

Table  7.2  -  Turbulent  circular  cylinder  splitter  plate  flow:  mesh  statistics  summary. 


L/D 

2D 

Elements 

2D 

Nodes 

Transverse 

Nodes 

Overall 

Nodes 

0.00 

23,500 

12,000 

32 

380,000 

0.25 

33,300 

16,900 

32 

540,000 

0.75 

37,200 

18,800 

32 

602,000 

1.50 

43,200 

21,800 

32 

699,000 

in  a  substantial  increase  in  overall  nodes,  especially  as  the  splitter  plate  length  is  increased. 
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Figure  7.18  shows  the  computational  mesh  near  the  cylinder  for  the  L/D  =  0.25,  0.75,  and 
1.50  cases.  Here  the  geometry  of  the  splitter  plate  as  well  as  the  refinement  near  the  walls 
and  in  the  near  wake  can  be  seen.  The  splitter  plate  thickness  is  ±D,  consistent  with  the 
experimental  setup  of  Anderson  and  Szewczyk  (Anderson  and  Szewczyk,  1997). 


(a) 


(b) 


(c) 


Fig.  7.18  -  Turbulent  circular  cylinder  splitter  plate  flow:  in-plane  mesh  near  the  cylinder  for 
(a)  L/D  =  0.25,  (b)  L/D  =  0.75,  and  (c)  L/D  =  1.50. 

7.2.3  Results 

Instantaneous  views  of  the  resulting  flowfields  are  shown  in  Figure  7.19  via  surfaces  of 
constant  vorticity  magnitude.  The  surfaces  are  colored  according  to  velocity  magnitude, 
showing  the  increased  quiescence  in  the  recirculation  region  brought  about  by  even  a  small 
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splitter  plate.  As  observed  by  Apelt  et  al  (Apelt  et  al.,  1973)  and  Apelt  and  West  (Apelt 
and  West,  1975)  for  Re  «  105,  the  splitter  plate  stabilizes  the  shear  layer  separation  points 
and  produces  a  narrower  wake.  For  all  cases  investigated  the  trend  is  that  a s  L/D  increases, 
the  wake  narrows. 

The  behavior  of  the  near  wake  region  is  more  closely  seen  in  Figure  7.20.  Here,  predicted 
surfaces  of  vorticity  magnitude  are  compared  with  flow  visualizations  from  Anderson  (Ander¬ 
son,  1994)  at  nearly  the  same  Reynolds  number  (Re  =  3700).  Qualitatively,  the  length  of  the 
formation  region  is  well  predicted  by  SFELES.  From  the  vorticity  magnitude  isosurfaces  it 
is  seen  that  increasing  L/D  produces  increasing  shear  layer  length  and  tighter  initial  roll-up. 
Referring  to  the  observations  of  Gerrard  (Gerrard,  1966)  and  Perry  et  al  (Perry  et  al.,  1982) 
regarding  the  physical  mechanisms  of  the  formation  region,  the  tighter  roll-up  of  the  shear 
layers  draws  less  fluid  from  the  opposite  side,  resulting  in  a  narrower  wake. 

Figure  7.21  shows  pressure  distributions  around  the  cylinder  for  each  splitter  plate  config¬ 
uration.  These  results  show  little  change  from  the  measurements  of  Anderson  and  Szewczyk 
(Anderson  and  Szewczyk,  1997)  at  Re  =  46,000  and  Apelt  et  al  (Apelt  et  al.,  1973)  at 
Re  =  20, 000.  These  profiles  show  that,  though  the  separation  point  location  does  not  change 
significantly,  the  pressure  behind  this  point  varies  with  L/D.  The  trend  is  for  increasing  base 
pressure  with  increasing  L/D  for  L/D  up  to  «  1,  and  then  decreasing  with  L/D  for  L/D  >  1, 
with  the  most  significant  variations  occurring  at  small  L/ D. 

Table  7.3  shows  the  variation  in  pressure  drag  ( CDp ),  shedding  frequency  (St),  and 
formation  length  (Lf)  with  L/D.  The  quantities  for  L/D  =  0  compare  well  with  experi¬ 
mental  measurements  as  noted  in  the  previous  section.  The  pressure  drag  on  the  cylinder 
decreases  with  increasing  L/D  for  the  L/D  =  0.25  and  L/D  =  0.75  cases,  then  increases 
from  L/D  =  0.75  to  L/D  =  1.50.  This  is  consistent  with  the  experimental  observations 
discussed  previously,  which  indicate  L/D  ss  1  as  the  minimum-drag  configuration.  The  shed¬ 
ding  frequency  increases  slightly  from  L/D  =  0  to  L/D  =  0.25,  then  decreases  to  L/D  =  0.75 
before  increasing  to  L/D  =  1.50.  This  trend  agrees  with  the  experiments  of  Anderson  and 
Szewczyk  (Anderson  and  Szewczyk,  1997)  and  Apelt  et  al  (Apelt  et  al.,  1973)  at  higher 
Reynolds  numbers,  where  the  shedding  frequency  was  seen  to  increase  for  very  small  splitter 
plates  then  decrease  to  a  minimum  at  approximately  L/D  =  1  and  finally  increase  with 
L/D  for  L/D  >  1.  Measurements  from  Anderson  and  Szewczyk  (Anderson  and  Szewczyk, 
1997)  of  formation  length  as  a  function  of  L/D  show  a  substantial  increase  for  very  small 
plates,  followed  by  moderate  increases  in  Lf  for  increasing  L/D  until  around  L/D  =  1,  after 
which  Lf  remains  fairly  constant.  The  SFELES  results  for  Lf  in  Table  7.3  agree  with  this 
experimentally  observed  trend. 
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Fig.  7.19  -  Turbulent  circular  cylinder  splitter  plate  flow:  instantaneous  surfaces  of  constant 
vorticity  magnitude,  uD/U r00  =  2.5.  Surfaces  are  colored  according  to  velocity 
magnitude,  with  blue  representing  low  values  and  red  representing  high  values. 


8.  CONCLUSION 


In  this  concluding  chapter  the  contributions  of  this  work  are  summarily  presented  and 
the  future  of  the  work  discussed. 

8.1  Development  of  Navier-Stokes  Solvers 

The  ultimate  contribution  of  this  work  is  the  development  of  a  parallelized  LES  solver 
for  use  with  unstructured  meshes.  In  the  course  of  developing  this  solver,  two  additional 
solvers  were  developed  as  intermediate  steps  to  the  LES  solver. 

8.1.1  SFE2D 

SFE2D  is  a  2D,  incompressible,  laminar,  unsteady  solver.  The  spatial  discretization  is 
formulated  using  a  SUPG/PSPG  FE  approach.  This  formulation  has  been  heavily  developed 
in  the  last  decade  and  has  the  advantage  of  allowing  equal-order  elements  as  well  as  having 
convective  stability  without  the  over-diffusive  characteristics  of  typical  upwinding  schemes. 
Linear  triangle  elements  are  utilized,  resulting  in  second-order  accuracy.  As  a  result  of  this 
spatial  discretization,  very  complex  geometries  can  be  easily  accommodated  via  unstructured 
triangle  meshes. 

The  temporal  discretization  uses  a  consistent  mass  matrix  and  is  second-order  accurate. 
An  implicit  Crank-Nicholson  scheme  is  used  for  the  pressure  and  diffusion  terms,  while  an 
explicit  second-order  Adams-Bashforth  scheme  is  used  for  the  convective  terms.  Though  the 
explicit  treatment  of  the  convective  terms  introduces  a  stability  limit  on  the  time  step  size, 
it  affords  several  advantages  when  the  algorithm  is  extended  to  3D. 

Solution  of  the  algebraic  system  arising  at  each  time  step  is  performed  using  the  SPARSKIT 
toolkit.  This  kit  is  a  “black  box”  GMRES  iterative  solver  package  written  in  Fortran  77  that 
includes  compact  matrix  storage  formats  and  many  preconditioning  algorithms.  The  best 
performance  in  terms  of  robustness  and  speed  in  this  application  was  realized  using  the 
ILUT  preconditioner. 

SFE2D  was  validated  against  published  benchmark  lid-driven  cavity  numerical  results  at 
Reynolds  numbers  between  100  and  5000;  as  well  as  published  experimental  and  numerical 
results  for  backward-facing  step  flow  at  Reynolds  numbers  between  150  and  1000,  and  circular 
cylinder  flow  at  Reynolds  numbers  between  20  and  140. 

8.1.2  SFE3D 

SFE3D  is  a  parallelized  3D,  incompressible,  laminar,  unsteady  solver.  A  2D  geometry 
and  periodic  flow  in  the  transverse  direction  are  assumed.  Keeping  in  line  with  this  as- 
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sumption,  a  FE  discretization  in  the  2D  plane  is  combined  with  a  spectral  method  in  the 
transverse  direction.  In  order  to  reduce  computational  costs,  all  nonlinear  terms  are  treated 
using  a  pseudo-spectral  approach  where  the  terms  are  evaluated  in  physical  space  and  then 
transformed  into  Fourier  space.  The  in-plane  and  temporal  discretizations  are  identical  to 
the  SFE2D  solver  and  are  described  above.  The  use  of  explicit  temporal  treatment  for  the 
nonlinear  terms  decouples  the  matrix  equations  in  Fourier  space,  reducing  computational  cost 
and  allowing  for  a  novel  parallelization  scheme  where  the  work  is  partitioned  in  Fourier  space. 
The  parallelization  is  implemented  using  OpenMP,  an  emerging  standard  for  shared-memory 
parallelism  that  allows  parallelization  with  little  programming  overhead. 

SFE3D  was  validated  against  published  numerical  and  experimental  results  for  flow 
past  a  circular  cylinder  at  Reynolds  numbers  195  and  300,  where  mode-A  and  mode-B  3D 
instabilities  are  present  in  the  wake,  respectively. 

8.1.3  SFELES 

SFELES  is  a  parallelized  incompressible  LES  solver.  The  classic  Smagorinski  SGS 
Reynolds  stress  model  with  van  Driest  damping  near  the  solid  walls  is  used.  Because  it 
is  a  direct  extension  of  SFE3D,  SFELES  utilizes  a  second-order  SUPG/PSPG  formulation  in 
the  2D  plane  and  a  spectral  formulation  in  the  transverse  direction.  The  temporal  accuracy  is 
second-order,  with  the  eddy  viscosity  terms  being  treated  using  an  explicit  Adams-Bashforth 
scheme. 

An  unstructured  mesh  in  the  2D  plane  allows  for  LES  calculations  to  be  run  on  complex 
2D  geometries  such  as  arbitrary  bluff  bodies,  arrays,  or  complex  multi-element  airfoils.  This 
capability  to  feasibly  perform  LES  on  unstructured  meshes  is  currently  very  rare  due  to  the 
high  computational  costs  associated  with  both  LES  and  unstructured  meshes. 

The  SFELES  solver  was  validated  by  comparison  with  published  experimental  and  nu¬ 
merical  studies  of  flow  past  a  circular  cylinder  at  Re  =  3900. 

8.2  Turbulent  Circular  Cylinder  Flow  with  a  Wake  Splitter  Plate 

The  value  of  unstructured  mesh  LES  capabilities  has  been  illustrated  via  a  brief  inves¬ 
tigation  of  turbulent  flow  past  a  circular  cylinder  with  an  attached  wake  splitter  plate  at 
Re  =  3900.  This  geometry  configuration  (assuming  a  finite-thickness  splitter  plate)  cannot 
be  well  modeled  with  a  single-block  structured  mesh,  but  rather  needs  an  unstructured  or 
block-structured  mesh.  To  the  author’s  knowledge,  no  other  numerical  results  exist  to  aug¬ 
ment  the  experimental  measurements  and  observations  of  3D  turbulent  flow  past  cylinders 
with  splitter  plates.  This  work  contributes  information  in  terms  of  the  physical  mechanisms 
of  vortex  formation  in  the  presence  of  splitter  plates  as  well  as  shedding  frequency  and  drag 
characteristics  for  multiple  splitter  plate  lengths. 
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8.3  Future  Work 


This  work  provides  a  foundation  for  much  additional  research.  The  topics  currently 
envisioned  are  discussed  below. 

8. 3. 0.0.1  Extension  of  SFE2D  to  axisymmetric  configurations. 

The  SFE2D  solver  can  be  easily  extended  to  handle  axisymmetric  configurations,  thereby 
allowing  simulation  of  flows  in  geometries  such  as  pipes  and  nozzles. 

8.3.0.0.2  Parallelization  for  distributed  memory  architectures. 

The  parallelization  scheme  developed  in  this  work  is  well  suited  for  implementation  on 
shared-memory  computers  only.  The  recent  affordability  of  distributed  memory  systems  such 
as  PC  clusters  makes  this  type  of  parallelism  attractive  despite  the  relative  programming 
difficulty.  If  the  work  is  partitioned  in  physical  space  rather  than  Fourier  space,  parallel 
implementation  on  a  distributed  memory  system  should  be  rather  straight-forward. 

8. 3.0.0. 3  Implementation  of  wall  functions  and  additional  boundary  conditions. 

In  its  current  form,  SFELES  contains  no  wall  functions  and  limited  boundary  conditions. 
The  addition  of  wall  functions  will  allow  higher  Reynolds  number  simulations  to  be  performed 
without  requiring  exorbitant  numbers  of  nodes  in  the  near-wall  region.  Boundary  conditions 
such  as  convective  outflow  and  far-field  inflow/outflow  will  also  enlarge  the  class  of  problems 
that  can  be  solved  with  this  solver. 

8. 3.0.0. 4  Development  of  feature  detection  algorithms. 

The  ability  to  identify,  isolate,  and  study  coherent  structures  has  become  an  important 
part  of  turbulence  research.  The  SFELES  solver  provides  an  excellent  “numerical  laboratory” 
for  research  in  feature  detection  algorithms. 

8.3.0.0.5  Investigation  of  block  preconditioners. 

Block  preconditioners  have  the  potential  of  cutting  the  matrix  solution  time  by  as  much 
as  an  order  of  magnitude.  The  “black  box”  solvers  investigated  in  this  work  (Aztec  (Tuminary 
et  al. ,  1999)  and  BILUT  (Saad,  2001))  provided  no  performance  improvement — in  fact,  the 
matrix  solution  time  was  increased  by  as  much  as  a  factor  of  two.  Further  investigation  of 
block  preconditioners  with  SFELES  may  prove  very  fruitful. 
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8. 3.0. 0.6  Implementation  of  additional  SGS  Reynolds  stress  models. 

This  work  has  focused  on  the  development  of  the  FE/spectral  algorithm  rather  than  on 
achieving  the  most  accurate  LES  solutions.  The  Smagorinski  model  implemented  in  SFELES 
is  a  very  basic  SGS  model,  and  has  a  number  of  shortcomings  (discussed  in  Section  6.3).  SGS 
model  development  is  an  area  of  intense  research,  and  SFELES  can  provide  an  excellent  test 
bed  for  application  of  these  models  to  complex  geometries. 

8. 3. 0.0. 7  Application  to  airfoil  problems. 

High-lift  aerodynamics  has  received  much  attention  in  the  past  decade,  both  in  experi¬ 
mental  studies  and  numerical  predictions.  The  difficulties  in  obtaining  acceptable  numerical 
solutions  are  twofold.  First,  the  multi-element  geometry  is  difficult  to  mesh  properly,  par¬ 
ticularly  when  using  structured  grids.  Secondly,  the  complex  separation  and  reattachment 
of  the  flow  on  multiple  bodies  is  problematic  for  RANS  turbulence  models.  Once  sufficient 
progress  has  been  made  in  terms  of  computational  speed,  boundary  conditions,  and  SGS 
models,  SFELES  will  be  well-suited  for  high-lift  airfoil  performance  prediction. 

8.3. 0.0.8  Extension  of  SFELES  to  cylindrical  coordinates. 

The  FE/spectral  discretization  used  in  SFELES  can  easily  be  adapted  to  cylindrical 
coordinates,  where  a  FE  discretization  is  used  in  the  x-r  plane  and  a  spectral  method  used 
in  the  6  direction.  This  would  expand  the  class  of  problems  that  can  be  solved  to  include 
turbulent  flow  in  pipes  and  nozzles,  including  swirling  flows  such  as  vortex  breakdown. 
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APPENDIX 


A.  COMPRESSED  SPARSE  ROW  (CSR)  MATRIX  FORMAT 

The  CSR  format  is  the  basic  matirx  storage  format  used  in  the  SPARSKIT  package, 
which  is  used  to  solve  to  linear  systems  generated  at  each  time  step.  In  order  to  avoid  extra 
computational  cost  in  conversion  between  formats,  it  is  also  the  native  storage  format  utilized 
inside  the  SFELES  solver. 

The  CSR  data  structure  consists  of  three  arrays:  the  double  precision  array  A  and  two 
integer  arrays  IA  and  JA.  Considering  the  general  matrix 


On  o  12  •  •  •  aXiv 

<221  0.22  •  •  •  o2n 


L°a/i . Omtvj 

with  NNZ  non-zero  values.  The  matrix  is  stored  as 


A[l. . .NNZ] 
A[k] 


where 

=  Real  value  of  the  kth  non-zero 

stored  row  by  row,  from  row  1  to  row  M 


JAfl. . .  NNZ1 
JA  [k] 


where 

=  Column  index  corresponding  to  the  value 
in  A[fc] 


IA[1. . .  N  +  1] 
IA[n] 


where 

=  Pointer  to  the  beginning  of  row  n  in  the 
A  and  JA  arrays 


As  an  example,  consider  the  3x3  array  with  five  non-zero  values 

'1.0  2.0  ' 

3.0  4.0  . 

5.0 


The  A  array  contains  the  nonzero  values  ordered  row  by  row 


A  = 

1.0  2.0  3.0  4.0 

5J0l 

The  JA  array  contains  the  column  indices  corresponding  to  the  A  values 

JA  = 

LU 

2  2  3 

m 

Finally,  the  IA  array  points  to  the  beginning  indices  of  the  rows  of  the  matrix  in  A  and  JA 

IA  = 

LO 

3  5  6 

. 
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Though  this  matrix  storage  format  may  be  somewhat  complex  in  terms  of  implementa¬ 
tion,  it  provides  profound  savings  in  terms  of  memory  requirements  for  sparse  matrices. 
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B.  INPUT  DATA  FILE  FORMAT  (2D) 


The  DPlot  (.dpi)  file  format,  a  rather  compact  and  versatile  ASCII  finite-element  format, 
is  used  as  the  input/output  format  for  the  2D  code.  The  format  is  described  below  via  the 
.dpi  file  listing  of  a  simple  example  and  the  corresponding  grid  shown  in  Figure  B.l. 

Bdy  Seg  3 


n8  nl 

Bdy  Seg  1 


Fig.  B.l  -  Mesh  corresponding  to  the  sample  DPlot  data  file,  where  e  denotes  an  element 
and  n  denotes  a  node. 
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0.1000e+01 
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An  explanation  of  each  section  is  now  given. 


1.  A  text  header. 

2.  The  number  of  elements,  the  number  of  nodes,  and  the  number  of  boundary  faces. 

3.  For  each  element,  the  number  of  nodes,  the  node  numbers  at  each  corner  in  the  coun¬ 
terclockwise  sense,  the  neighboring  elements  with  neighbor  #1  being  opposite  node  #1, 
and  a  running  counter. 

4.  The  number  of  nodes. 

5.  Four  reference  free-stream  state  quantities  in  conservation  variables  and  two  placehold¬ 
ers. 

6.  For  each  node,  the  x  and  y  coordinates,  four  state  quantities  in  conservation  variables, 
and  a  running  counter. 

7.  The  number  of  boundary  segments,  the  number  of  corners  between  boundary  segments, 
the  node  numbers  at  each  corner,  and  the  left-oriented  boundary  name  for  each  corner, 
i.e.  the  name  of  the  boundary  oriented  with  the  domain  to  the  left,  having  this  node 
as  its  last  node. 
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8.  For  each  segment,  the  number  of  boundary  faces  along  the  segment  and  the  number  of 
the  segment.  Each  segment  must  be  described  such  that  the  mesh  is  on  the  left. 

9.  For  each  boundary  face,  the  two  forming  nodes  in  opposite  sense  to  the  description  of 
the  segment,  the  boundary  condition  type  for  the  first  node,  the  boundary  condition 
type  for  the  second  node,  the  neighboring  element,  and  a  running  counter. 

10.  An  end-of-file  indicator. 
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C.  INPUT  DATA  FILE  FORMAT  (3D) 


A  modified  version  of  the  DPlot  (.dpi)  file  format  was  developed  for  use  as  the  in¬ 
put/output  format  for  the  SFELES  solver.  The  format  is  described  below  via  the  .dpi  file 
listing  of  a  simple  example.  The  2D  mesh  is  identical  to  the  example  mesh  of  Appendix  C, 
but  in  this  case  the  flowfield  variables  are  stored  at  four  equi-spaced  planes,  as  shown  in 
Figure  C.l. 


Fig.  C.l  -  Mesh  corresponding  to  the  sample  modified  DPlot  data  file,  where  e  denotes  an 
element  and  n  denotes  a  node. 
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(10) 


7  6  117  1 

8  7  11  8  2 

0  -1 


An  explanation  of  each  section  is  now  given. 

1.  A  text  header. 

2.  The  number  of  elements,  an  unused  placeholder,  and  the  number  of  2D  planes  (Fourier 
modes). 

3.  For  each  2D  element,  the  number  of  nodes,  the  node  numbers  at  each  corner  in  the 
counterclockwise  sense,  the  neighboring  elements  with  neighbor  #1  being  opposite  node 
#1,  and  a  running  counter. 

4.  The  number  of  nodes. 

5.  Four  reference  free-stream  state  quantities  in  conservation  variables  and  two  placehold¬ 
ers. 

6.  For  each  node,  the  x  and  y  coordinates,  four  state  quantities  in  conservation  variables 
for  each  two  dimensional  plane  and  time  steps  n  and  n  —  1  (i.e.  the  four  variables  first 
for  plane  0  and  time  step  n,  then  plane  1  and  time  step  n,. . . , plane  3  and  time  step  n, 
plane  0  and  time  step  n  —  1,. . . ,  plane  3  and  time  step  n  —  1),  and  a  running  counter. 

7.  The  number  of  boundary  segments,  the  number  of  corners  between  boundary  segments, 
the  node  numbers  at  each  corner,  and  the  left-oriented  boundary  name  for  each  corner, 
i.e.  the  name  of  the  boundary  oriented  with  the  domain  to  the  left,  having  this  node 
as  its  last  node. 

8.  For  each  segment,  the  number  of  boundary  faces  along  the  segment  and  the  number  of 
the  segment.  Each  segment  must  be  described  such  that  the  mesh  is  on  the  left. 

9.  For  each  boundary  face,  the  two  forming  nodes  in  opposite  sense  to  the  description  of 
the  segment,  the  boundary  condition  type  for  the  first  node,  the  boundary  condition 
type  for  the  second  node,  the  neighboring  element,  and  a  running  counter. 

10.  An  end-of-file  indicator. 
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D.  COMMUTATIVE  PROPERTIES  OF  THE  LES  FILTER  OP¬ 
ERATION 


The  LES  filtering  operation  given  in  (6.2)  is  repeated  here: 

f(x,t)=  f  f(r,t)G(x  —  r)df.  (D.l) 

Jo 

In  the  derivation  of  the  LES  governing  equations  (Section  6.2),  it  was  stated  that  the  filtering 
operation  commutes  with  both  spatial  and  temporal  derivatives,  as  is  now  shown. 

First,  the  temporal  derivative, 


dh  21 

dt  dt ' 

Substituting  in  the  definition  of  the  filter  operation  (D.l)  gives 


d_ 

dt 


Xi  -  ri)f{rh  t)dri 


df{rut) 

dt 


dri. 


(D.2) 


(D.3) 


Assuming  the  integral  on  the  LHS  is  temporally  continuous,  the  derivative  can  be  moved 
inside  the  integral: 


J  ^[G(zi  -  rf)/(r<,  t)]dn  =  G(xi  -  n) 


df(rut ) 
dt 


dri. 


(D.4) 


Finally,  because  the  filter  function  G  is  not  a  function  of  time  we  see  that 

i  0{Xi  ~  rtdJiirdri = i  G{x<  - 

and  the  filtering  operation  does,  in  fact,  commute  with  temporal  derivatives. 
Now  the  commutative  property  of  the  spatial  derivative  is  verified: 

2LiW 

dxi  dxi 

Again,  substituting  in  the  definition  of  the  filter  operation  (D.l)  gives 

df{ri}t ) 


4 -  [  G{xi-ri)f{ri)t)dri=  [  G(xi-ri)- 
dxi  Jq  Jn 


dr i 


dri. 


(D.5) 

(D.6) 


(D.7) 


(D.8) 
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Assuming  the  integrand  on  the  LHS  is  spatially  continuous,  the  derivative  can  be  moved 
inside  the  integral.  The  RHS  can  be  integrated  by  parts  to  obtain 

Ja^-[G(xi-ri)S(ri,t)}iri  =  [G(i.  -  r,)/(r,,i)]r- 

(D.9) 

where  the  T  denotes  the  boundary  of  the  domain.  This  term  is  identically  zero  as  long  as  G 
is  zero  on  the  boundary.  Dropping  this  term  and  rearranging  the  LHS  gives 


L~d k  jj{rut) 


dG(xi  -  n) 


(D.10) 


Because  G  —  G[x{  —  rj),  we  have  dG/dxi  =  —dG/dri.  Applying  this  gives  the  desired  result 


dG(xi  -  ri) 


in  oxi 


~f(ri,t)dri  —  [ 
J  n 


dG(xi  -  ti) 


f{rht)dru 


(D-ll) 


and  we  see  that  the  filtering  operation  does,  in  fact,  commute  with  spatial  derivatives. 
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